GCloud Dataproc image upgrades & Zeppelin Notebooks

November 10, 2020
dataproc gcloud gcp zeppelin

A Dataproc cluster with the Zeppelin Notebooks component enabled is a great tool for exposing a quick collaboration-, query and insight interface to data-stores in the Google Cloud Platform (GCP).

It is easy to set up. In just a few clicks, gcloud commands or lines of Terraform configuration, you can access the Zeppelin UI and start creating Notebooks. A Google Cloud Storage (GCS) bucket is used to store the notebooks, which means that after re-creating the cluster for software or hardware changes, you can pick up where you left off.

Or so you’d hoped. I recently attempted to upgrade a Dataproc cluster with Zeppelin, from image version 1.4-debian10 to 1.5 and found that all my Notebooks were missing!

Initially I thought something might be wrong with the distribution. After all, the release notes mentioned recent changes with Zeppelin and the GCSNotebookRepo plugin. But after digging a bit deeper, admittedly while writing up a Dataproc bug report on the issue, I found the cause of the problem.

The Zeppelin Notebooks version in the 1.5-debian10 image is a 0.9 variant, while the previous version was 0.8.2 and the Zeppelin Notebook storage format has changed with Zeppelin 0.9.

To fix this, the Zeppelin upgrade guide mentions running an update script. I anticipated having to jump through a bunch of hoops to do this, since the Notebooks are stored in a GCS bucket.. but to my surprise, it turned out to be easy.

After upgrading the cluster to a 1.5-debian10 or even to the preview-debian10 image, run the following commands from a terminal with appropriately configured GCloud tools:

# Backup your notebook directory

gsutil cp -r "gs://$DATAPROC\_CLUSTER\_STAGING\_BUCKET/notebooks" \\
"gs://$DATAPROC\_CLUSTER\_STAGING\_BUCKET/notebooks-backup-$(date +"%Y-%m-%d")"

# Execute the upgrade command

gcloud compute ssh --project $PROJECT --zone $ZONE \\
\--command="/usr/lib/zeppelin/bin/upgrade-note.sh -d"

After running that, your notebooks should show up and work as before. (if the don’t, hit the Notebooks ♽ icon to trigger a reload)

Hopefully this will save you some time. If you have suggestions or would like to say hi, drop me a line.

-@niels