Setup problems: Running the Lesson 1 Notebook

UPDATE

I finally managed to get it to run in parallel using all available cores. Had to add the following line to the python notebook.

theano.config.openmp = True

1 Like

Hi,

I’m struggling with a home setup. when i try to run vgg.fit() (from the unmodified notebook) i get an error thhat boils down to :

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2696)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2654)()

h5py/h5f.pyx in h5py.h5f.open (/home/ilan/minonda/conda-bld/work/h5py/h5f.c:1942)()

IOError: Unable to open file (Truncated file: eof = 105693184, sblock->base_addr = 0, stored_eoa = 553482496)

It seems it’s trying to load files from a hardcoded path on the original developer’s machine… I suppose there’s a workaround, but does anybody know what it is ?

Hi, I run the lesson1 notebook succesfully, however when training the model, I dont see the validation results. any ideas why does this happen?

#train model
vgg.fit(batches, val_batches, nb_epoch=1,verbose=1)

Epoch 1/1
20936/21000 [============================>.] - ETA: 1s - loss: 0.0960 - acc: 0.9759

Hi iodbh,

Did you donwload the Vgg16.py version in the .zip file in the Lecture 1 notes? It seems to be an outdated version - try using the one in the GitHub repository instead. It fixed the same problem for me.

Btw. Vgg16 is loading from a hardcoded path on www.platform.ai/models/ , the error message is just a confusing one from the h5py library (I think).

Hope this helps!

When I first set up everything on AWS, Jupyter Notebook was working okay. I hadn’t run any of the Lesson 1 notebook but I knew Jupyter was working fine because I ran basic code like 1+1 and importing theano in cells and everything was okay. Now I can’t connect to Jupyter Notebook. I’m getting this message every time I open a notebook

Thanks a lot Mikkel, your post made me realize that the error message was misleading. Turns out the download of the models was interrupted the first time i ran the notebook, which led to the file being corrupted.

If somebody else runs into the same problem, the solution is to clear the keras cache (rm ~/.keras/models/*) then re-run the code. It will download the file again.

4 Likes

I wanted to share a few small changes I had to make to get Lesson 1 to work with the directory structure shown at the top of dogs_cats_redux.ipynb

utils/
    vgg16.py
    utils.py
lesson1/
    redux.ipynb

Initially when running the import code I received the following error:

#Allow relative imports to directories above lesson1/
sys.path.insert(1, os.path.join(sys.path[0], '..'))

#import modules
from utils import *
from vgg16 import Vgg16

ImportError                               Traceback (most recent call last)
<ipython-input-2-2ec9e6c6812a> in <module>()
  1 sys.path.insert(1, os.path.join(sys.path[0], '..'))
  2 
----> 3 from utils import *
  4 from vgg16 import Vgg16
  5 

ImportError: No module named utils

To fix this and some other import errors I did the following.

  • Added an empty __init__.py file to the utils directory

  • Added vgg16bn.py to the utils directory

  • Tweaked the import code above adding utils. prefixes as follows.

    from utils.utils import *
    from utils.vgg16 import Vgg16

As you can probably tell I don’t write much python in my day job. I suspect these steps are obvious to experienced python programmers but hope they will help less experienced python programmers like myself.

4 Likes

Thanks for sharing your solution, @telarson. I had the same problem and this was helpful. But, did you mean init.py instead of input.py?

@agulati, thanks for noticing! I just corrected this.

Hey all! I have managed to debug the image ordering error myself but now am faced with the following error, does anyone have some suggestions as to how it could be fixed?

ValueError: Dimension 0 in both shapes must be equal, but are 3 and 64 for ‘Assign_9’ (op: ‘Assign’) with input shapes: [3,3,224,64], [64,3,3,3].

For the ReduceLROnPlateau error, I first had to upgrade pip

pip install --upgrade pip
then upgrade keras
pip install keras --upgrade
then restart my jupyter notebook

Rachel,

I intend to use the NVIDIA GPU on my Laptop. I have successfully installed theano, keras and other dependencies. I am able to run the Lesson 1 notebook (upto where it was covered in Lesson 1) without any hitch. However, GPU is not being used. As covered in the video, I could find a theanorc file in “C:…\toolkits\keras-1.1.0\docker\theanorc” with the following content:

[global]
floatX = float32
optimizer=None
device = gpu

I also tried setting the THEANO_FLAGS=THEANO_FLAGS_GPU.
The GPU is not being used. Please help me to resolve this.

Hi Thanks for you input, could you please add some information about how you found that setting. I am interested in using say 5 of my 8 cores is there anything else I should do. Thanks

Edited: Ok I see the environment variable in you previous post. So I edited ~/.theanorc and added a section [global] in which I put the environment variable and the suggested true statement. But I don’t see evidence of improvement. I noted that there is no python module, openmp in python. there is an openmpi module. Not sure they are the same.

It seems to use openmp you must first release the GIL and write code to handle the parallelism
My presumption that this is all handled by theano perhaps is a hopeful one
I’ve also noted that perhaps I should add cython to my environment.
Like most parallel issues this has got difficult quickly and although I would like to use my cores I feel this is a step to far here.

Unable to instantiate Vgg16 object

Whenever I run

vgg = Vgg16()
# Grab a few images at a time for training and validation.
# NB: They must be in subdirectories named based on their category
batches = vgg.get_batches(path+'train', batch_size=batch_size)
val_batches = vgg.get_batches(path+'valid', batch_size=batch_size*2)
vgg.finetune(batches)
vgg.fit(batches, val_batches, nb_epoch=1)

I get the following error:

OError                                   Traceback (most recent call last)
<ipython-input-24-2b6861506a11> in <module>()
----> 1 vgg = Vgg16()
      2 # Grab a few images at a time for training and validation.
      3 # NB: They must be in subdirectories named based on their category
      4 batches = vgg.get_batches(path+'train', batch_size=batch_size)
      5 val_batches = vgg.get_batches(path+'valid', batch_size=batch_size*2)

/Users/indraner/dev/datascience/fastai/courses/deeplearning1/nbs/vgg16.py in __init__(self)
     31     def __init__(self):
     32         self.FILE_PATH = 'http://www.platform.ai/models/'
---> 33         self.create()
     34         self.get_classes()
     35 

/Users/indraner/dev/datascience/fastai/courses/deeplearning1/nbs/vgg16.py in create(self)
     80 
     81         fname = 'vgg16.h5'
---> 82         model.load_weights('/Users/indraner/dev/datascience/fastai/data/dogsVcats/vgg16.h5')
     83 
     84 

/usr/local/lib/python2.7/site-packages/keras/engine/topology.pyc in load_weights(self, filepath, by_name)
   2512         '''
   2513         import h5py
-> 2514         f = h5py.File(filepath, mode='r')
   2515         if 'layer_names' not in f.attrs and 'model_weights' in f:
   2516             f = f['model_weights']

/usr/local/lib/python2.7/site-packages/h5py/_hl/files.pyc in __init__(self, name, mode, driver, libver, userblock_size, swmr, **kwds)
    270 
    271                 fapl = make_fapl(driver, libver, **kwds)
--> 272                 fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
    273 
    274                 if swmr_support:

/usr/local/lib/python2.7/site-packages/h5py/_hl/files.pyc in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
     90         if swmr and swmr_support:
     91             flags |= h5f.ACC_SWMR_READ
---> 92         fid = h5f.open(name, flags, fapl=fapl)
     93     elif mode == 'r+':
     94         fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/Users/travis/build/MacPython/h5py-wheels/h5py/h5py/_objects.c:2687)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/Users/travis/build/MacPython/h5py-wheels/h5py/h5py/_objects.c:2645)()

h5py/h5f.pyx in h5py.h5f.open (/Users/travis/build/MacPython/h5py-wheels/h5py/h5py/h5f.c:1933)()

IOError: Unable to open file (Truncated file: eof = 237635312, sblock->base_addr = 0, stored_eoa = 553482496)

I have downloaded the .h5 file and trying load it from local disk, since I was getting error while loading it from the ‘platform.ai’ url

Also the code is the latest from github.
Can someone pls help?

I am in the ~/nbs directory but got the error of No module named utils. I used ls command to check the directory and only found the .ipynb file but nothing else. Am I supposed to see a bunch of .py files in this directory?

I am also having the problem where I cannot instantiate the Vg16() object;

ImportError Traceback (most recent call last)
in ()
----> 1 vgg = Vgg16()
2 # Grab a few images at a time for training and validation.
3 # NB: They must be in subdirectories named based on their category
4 batches = vgg.get_batches(path+‘train’, batch_size=batch_size)
5 val_batches = vgg.get_batches(path+‘valid’, batch_size=batch_size*2)

/home/gpp8p/PycharmProjects/dlcourse/vgg16.pyc in init(self)
31 def init(self):
32 self.FILE_PATH = ‘http://www.platform.ai/models/
—> 33 self.create()
34 self.get_classes()

I have verified that my keras is set to run as theano - in fact, the
notebook says “Using Theano backend.”

Any help would be greatly appreciated

I’m looking at this error, and I see this:

31 def init(self):
32 self.FILE_PATH = ‘http://www.platform.ai/models/
—> 33 self.create()
34 self.get_classes()

what I’m wondering is this: I am running on a stand-along Linux box
with its own CUDA board. What would be the right value for
FILE_PATH in those circumstances ?

-George Pipkin

when I try to run it in straight python, I end up getting this error:
ImportError: (‘The following error happened while compiling the node’, DeepCopyOp(convolution2d_1_W), ‘\n’, ‘/home/gpp8p/.theano/compiledir_Linux-4.8–generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/tmp8Z3X8i/6bce617acb30aa3bbe8d048c82a553bc.so: undefined symbol: _ZdlPvm’, ‘[DeepCopyOp(convolution2d_1_W)]’)

After doing a good deal of Googling, I discover that this has to do with an incompatability in compliers:

Not sure how to get g++ version 5, but that looks like what it wants

Unfortunately changing the .theanorc to include:

cxx = /usr/bin/g+±5

does not remedy this issue

I was able to get through this error. The key is in the .theanorc file. Mine looks like this:

[blas]
ldflags =

[global]
floatX = float32
device = gpu

By default the compiled files were being written to my local network drive.

Since I have limited space on this drive (on a school’s network),

we can change the path to compile the files on the local machine.

You will have to create the directories and modify according to where you

want to install the files.

Uncomment if you want to change the default path to your own.

base_compiledir = /local-scratch/jer/theano/

[nvcc]
fastmath = True

[gcc]
cxxflags = -ID:\MinGW\include
cxx = /usr/bin/g+±5

[cuda]

Set to where the cuda drivers are installed.

You might have to change this depending where your cuda driver/what version is installed.

root=/usr/local/cuda-8.0/