Running in non-Jupyter notebooks (like DataBricks)?

I’m trying to get lesson-1 to run in a DataBricks notebook attached to a p2.xlarge AWS instance.

I first ran into problems because I wasn’t able to simply do things like import vgg16 to import local python files as libraries, so I created “eggs” from them using easy_install and attached them to the notebook as libraries.

Then I ran into several missing libraries and attached them one by one.

It would have been nice to see a list of all needed libraries in one place, but I wasn’t able to find one.

When I ran Vgg16() I got an error indicating my dim_ordering was wrong, so I had to create a version of vgg16.py that has an explicit dim_ordering = "th" in the MaxPooling2D invocation and re-built the python egg.

After setting up all needed libraries I finally ran into this error from the tensorflow backend:

Dimension 0 in both shapes must be equal, but are 3 and 64

It might be an issue of version mismatch. I think you’re assuming an environment where certain specific versions of TF, etc are installed, but which library versions are needed is not stated explicitly anywhere that I could easily find.

I love this course but want to be able to run it in my preferred notebook environment (and it’s not Jupyter :slight_smile:). The above feels like a lot of struggle, please tell me I’m missing something :slight_smile:

Hi, Prasad!

Everything you need is in the GitHub repo.

  1. Code used to set up environment. Particularly your issue was with line 34 of Theano config.
  2. Imports

Hope this helps!

By the way, can you share the benefits of DataBricks notebook compared to Jupyter?

Thanks Alex

I think it’s not straightforward to explain the difficulty. It boils down to a few ways the DataBricks notebook environment is different from Jupyter. In DataBricks it’s not very smooth to simply do import vgg16 to import contents of a local file vgg16.py. It kind of works but not quite.

On the other hand, the DataBricks environment is a (Spark-based) managed AWS-cloud service, and it’s super-easy to launch a p2.xlarge instance and start working with it without every dealing with the AWS Console page. The notebook environment is better designed from a UI point of view, IMHO. At our company we have a paid subscription, but there’s a community edition of DataBricks that one can play with.

The setup.sh is of course tailored to an instance spun up directly from the AWS console and tailored to Jupyter, so I’m not able to use it. In DataBricks one cannot ssh to a spun-up instance, but on the other hand the notebook allows specifying a %sh magic keyword in the cell that lets you run shell commands.

At any rate, I was able to package the utils, vgg16, and vgg16bn as eggs using easy_install and manually import the various libraries (3-4 of them I think) and everything is running smoothly. In the interest of ease of use on other environments, I wonder if it’s a good idea to make the above 3 libs available on PIP or some other suitable place so people don’t need to rely on having to import local files. And similarly it would be great to have the needed libs etc specified somewhere rather than being hidden in the setup.sh