Lesson 1 discussion

@Rothrock42

Thanks for the reply, I think that might be my problem. I don’t see a .theanorc file anywhere so I’m guessing the CPU was used to do the work.

@armand I installed all of this on my own machine and it wasn’t created by default. I’m on a laptop without a nvidia gpu, so I didn’t need to create it. It should be in your user directory so vim ~/.theanorc should create it. I can’t quite remember what is supposed to go into it, but I looked back at Lesson 1 video around 1:27 and this seems to be it.

[global]
floatX = float32
device = gpu0

Hi,

I am already in trouble on lesson 1.
My platform is local MAC with the OSX 10.11.6 (El Capitan) and 8GB of RAM
My Python platform is the one I use for other projects, has Theano, Keras, Tensorflow and many other libraries installed on a conda environment. Python version used is 2.7.

When using the notebook on lesson 1, I am getting stuck with the message that the .h5 file is corrupted

See below the last part of the error message.

Thanks for the help
Peter

/Users/peterhirt/anaconda2/envs/tf/lib/python2.7/site-packages/h5py/_hl/files.pyc in make_fid(name, mode,   userblock_size, fapl, fcpl, swmr)
 90         if swmr and swmr_support:
 91             flags |= h5f.ACC_SWMR_READ
---> 92         fid = h5f.open(name, flags, fapl=fapl)
 93     elif mode == 'r+':
 94         fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/Users/ilan/minonda/conda-bld/work/h5py/_objects.c:2696)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/Users/ilan/minonda/conda-bld/work/h5py/_objects.c:2654)()

h5py/h5f.pyx in h5py.h5f.open (/Users/ilan/minonda/conda-bld/work/h5py/h5f.c:1942)()

IOError: Unable to open file (Truncated file: eof = 163864584, sblock->base_addr = 0, stored_eoa = 553482496)

You might have to download the h5f file again? the filetype is very fussy and the whole thing can get corrupted if there was a blip with the download. Hope that helps!

I’m using Windows 7 on a laptop with a decent Nvidia GPU, and it’s been very difficult trying to get the first lesson to run. I’ve worked on it for 3-4 nights in a row, losing sleep…

I think I finally managed to sorta get it to work, so I’ll try to summarize what worked for me.

There were both Nvidia driver issues, and Python version issues.
I’ll detail the Python issues first, since the driver issues might be unique to my own hardware.

My setup:
Dell e5550, Windows 7 x64, Visual Studio 2013 Community Edition, 2 versions of GCC (5 and 6) already installed and on the path, also MSYS2 was installed.

What worked for me was following the steps from this excellent github issue, opened for Theano:
https://github.com/Theano/Theano/issues/5348

This was after I’ve already followed 3 or 4 other blog posts/tutorials, and none of them worked.

It’s for Windows 10 & Visual Studio 2015, but most of it applied to my setup (except for step #9, which wasn’t necessary for me).

The 2 key things, I think, were creating a separate self-contained environment for Python 3.5 (step #2 there), and making it use the GCC version from the mingw which came with Anaconda instead of the 2 versions I already had lying around (step #10).
I didn’t try step #15 (using cuDNN) or step #16 (using cuBLAS), or step #17-18 (installing PyCharm), but I might do 15-16 it if it seems slow.

I still had to fix some errors in the scripts for lesson 1, having to do with installing Python 3 instead of Python 2. I don’t know much about Python, I’ve taught it myself 20 years ago but never used it since. So that took some googling.
In fact the “plots” function still isn’t recognized and I have no idea what’s the alternative, but I’ll work on it…

The only other thing I did was to lower the number of batches (batch_size) to 16, or I’d get an “out of memory” error.

Now for the driver issues.
These might be due to something faulty in my hardware, but I’ll include them just in case it might help someone with the same problems.
I have a Dell laptop, with what GeForce calls the “Optimus” setup – both an Intel graphics card, and an Nvidia GeForce 830M card. There’s a way to tell Windows what graphics card to use for each application, and what the default should be. The default has been the Intel card, which saves some battery life.

After I downloaded and installed the latest CUDA version from the GeForce site, I started getting error messages when I tried to run the sample theano script with the GPU, telling me that the GPU isn’t available.
The fix was to change the system-wide default to use the Nvidia card, using the Nvidia control panel.

After that, I started getting very strange crashes when running that script. Either Windows would tell me that the Nvidia card had been “ejected” (which is weird since it’s soldiered to the motherboard), and I would need a reboot in order to enable the driver, or it would tell me that the driver had crashed and it restored it.
Part of the reason is that I’ve discovered the CUDA install updates the graphics driver version too,you have to explicitly tell it not to do that.

After uninstalling and installing many driver/CUDA versions, the one combination which sorta worked with almost no crashes was:

GeForce driver version 368.69 (latest is 378.66)
CUDA version 7.5 (the latest is CUDA 8).

I still get a crash when running one of the sample Theano scripts, but no crash at all when running the course code from lesson 1.

Thank you, this is very helpful.

Hi everyone. On executing the vgg= Vgg16() command , I get the following error ? How to resolve the same ?

OSError Traceback (most recent call last)
in ()
----> 1 vgg = Vgg16()

/Users/navneetmkumar/Documents/courses/deeplearning1/nbs/vgg16.py in init(self)
31 def init(self):
32 self.FILE_PATH = ‘http://www.platform.ai/models/
—> 33 self.create()
34 self.get_classes()
35

/Users/navneetmkumar/Documents/courses/deeplearning1/nbs/vgg16.py in create(self)
80
81 fname = ‘vgg16.h5’
—> 82 model.load_weights(get_file(fname, self.FILE_PATH+fname, cache_subdir=‘models’))
83
84

/Users/navneetmkumar/anaconda/lib/python3.6/site-packages/keras/engine/topology.py in load_weights(self, filepath, by_name)
2700 “”"
2701 import h5py
-> 2702 f = h5py.File(filepath, mode=‘r’)
2703 if ‘layer_names’ not in f.attrs and ‘model_weights’ in f:
2704 f = f[‘model_weights’]

/Users/navneetmkumar/anaconda/lib/python3.6/site-packages/h5py/_hl/files.py in init(self, name, mode, driver, libver, userblock_size, swmr, **kwds)
270
271 fapl = make_fapl(driver, libver, **kwds)
–> 272 fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
273
274 if swmr_support:

/Users/navneetmkumar/anaconda/lib/python3.6/site-packages/h5py/_hl/files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
90 if swmr and swmr_support:
91 flags |= h5f.ACC_SWMR_READ
—> 92 fid = h5f.open(name, flags, fapl=fapl)
93 elif mode == ‘r+’:
94 fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/Users/ilan/minonda/conda-bld/h5py_1482533836832/work/h5py/_objects.c:2856)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/Users/ilan/minonda/conda-bld/h5py_1482533836832/work/h5py/_objects.c:2814)()

h5py/h5f.pyx in h5py.h5f.open (/Users/ilan/minonda/conda-bld/h5py_1482533836832/work/h5py/h5f.c:2102)()

OSError: Unable to open file (Truncated file: eof = 14647296, sblock->base_addr = 0, stored_eoa = 553482496)

Hi @oren01
I also had some problems trying to install the packages but after some troubleshooting I could fix all the issues. My machine has a Windows 10 Preview and so, I should get more issues than you that is running a stable Windows 7. Any way, let me try share some comments.

  1. In the wiki, there is a good step-by-step tutorial about how to install Anaconda on Windows machine.
  2. I am running Windows 10 and VS 2015 but you are running VS 2013. For Windows 10, it seems there is path issue between VS and CUDA but you should not face the same issue once that you are not running the same environment. Anyway, you can try to use the same workaround and so, run the vcvarsall.bat, inside you docker (created by Anaconda installation) and after that, try to launch your notebook.
  3. About GPU and CPU, to differentiate between both, you need just change the .theanorc configuration file and Keras will take care about what to use.
    I hope these few points can help in something

I am studying the code in the Cats and Dogs redx notebook, trying to learn some of the Python.

After searching for documentation on this line of code for 30 minutes and not finding what felt like clarity to me, I decided to ask here:

#Allow relative imports to directories above lesson1/
sys.path.insert(1, os.path.join(sys.path[0], ‘…’))

#import modules
from utils import *
from vgg16 import Vgg16

I mostly get what is going on, but my confusion is with sys.path.insert(1, os.path.join(sys.path[0], ‘…’))
Where can I find python documentation on this?

I looked here at the sys module, but all it listed was sys.path and not sys.path.insert.
https://docs.python.org/3/library/sys.html#module-sys

Thanks for any help!

Hi @york,
This is a little bit intuitive: you get the above directory string and join with string zero index. Then you get the result and insert in position 1 of path variable. Another way to see what happens is to make notebook do the job. Type the following commands (each line in one cell), run the cells and see the results. Note that for each command line you can see what is happening until the final result. Each time you repeat the commands, a new above directory string will be added to the path.
Hope this can help you

import sys, os
sys.path
os.path.join(sys.path[0], '..')
sys.path.insert(1, os.path.join(sys.path[0], '..'))
sys.path
1 Like

Thanks @carlosdeep, for explaining this.

I will give that a shot in the Notebook - I really appreciate the help!

Thanks - I just walked through your example code and now I understand.

Thanks again!

I have tried to set things up on my local Ubuntu machine with a GTX1060. In the Lesson 1 notebook, when I get to “import utils”, I get this error:

/usr/include/string.h: In function ‘void* __mempcpy_inline(void*, const void*, size_t)’:
/usr/include/string.h:652:42: error: ‘memcpy’ was not declared in this scope
   return (char *) memcpy (__dest, __src, __n) + __n;
                                      ^
ERROR (theano.sandbox.cuda): Failed to compile cuda_ndarray.cu: ('nvcc return status', 1, 'for cmd', 'nvcc -shared -O3 -m64 -Xcompiler -    DCUDA_NDARRAY_CUH=mc72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden -Xlinker -rpath,/home/gram/.theano/compiledir_Linux-4.4--generic-x86_64-with-debian-stretch-sid-x86_64-3.5.2-64/cuda_ndarray -I/home/gram/anaconda3/lib/python3.5/site-packages/theano/sandbox/cuda -I/home/gram/anaconda3/lib/python3.5/site-packages/numpy/core/include -I/home/gram/anaconda3/include/python3.5m -I/home/gram/anaconda3/lib/python3.5/site-packages/theano/gof -o   /home/gram/.theano/compiledir_Linux-4.4--generic-x86_64-with-debian-stretch-sid-x86_64-3.5.2-64/cuda_ndarray/cuda_ndarray.so mod.cu -L/home/gram/anaconda3/lib -lcublas -lpython3.5m -lcudart')

In searching around I have seen some people having this issue when compiling Theano itself, but I’m not doing that; the fix in that case was getting nvcc to add -D_FORCE_INLINES during compilation. I’m not sure where I would do this. Also this problem was supposedly fixed in CUDA 8.0 but that’s what I have installed.

Has anyone else seen this and have any suggestions?

I solved the problem in the previous post by using the workaround here: https://github.com/Theano/Theano/issues/4425

Now I get this error instead further on: nvcc fatal : Value ‘sm_61’ is not defined for option ‘gpu-architecture’

I tried to install unzip and got error. I tried with sudo apt-get update and do install again but it not help. Did I miss anything?
Thanks ahead

ubuntu@ip-172-31-59-179:~$ sudo apt-get update                                                        
Hit:1 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial InRelease 
Hit:2 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial-updates InRelease                        
Hit:3 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial-backports InRelease                      
Hit:4 http://security.ubuntu.com/ubuntu xenial-security InRelease                                    
Ign:5 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  InRelease       
Get:6 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Release [564 B] 
Fetched 564 B in 0s (932 B/s)   
Reading package lists... Done 
 
ubuntu@ip-172-31-59-179:~$ sudo apt-get install unzip                                                 
Reading package lists... Done 
Building dependency tree        
Reading state information... Done 
You might want to run 'apt-get -f install' to correct these: 
The following packages have unmet dependencies: 
libsmbclient : Depends: samba-libs (= 2:4.3.11+dfsg-0ubuntu0.16.04.1) but 2:4.3.11+dfsg-0ubuntu0.16.04.3 is to be installed 
samba-libs : Depends: libwbclient0 (= 2:4.3.11+dfsg-0ubuntu0.16.04.3) but 2:4.3.11+dfsg-0ubuntu0.16.04.1 is to be installed                                                                                                      
E: Unmet dependencies. Try 'apt-get -f install' with no packages (or specify a solution). 
 
ubuntu@ip-172-31-59-179:~$ apt-get –f install                                                                                                                   E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)                                                                                             
E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?

You have to use sudo apt-get -f install

I gave up on Theano eventually. The last release was about 6 months back and trying to get the trifecta of NVidia driver, CUDA and Theano to play nice was just turning into a massive headache. So I switched the backend to TensorFlow and removed the theano imports from utils.py; they were breaking things and are not used (at least not in the lesson 1 notebook).

Unfortunately it seems the VGG16 model has issues with TensorFlow; I get this when instantiating:

ValueError: Negative dimension size caused by subtracting 2 from 1 for 'MaxPool_1' (op: 'MaxPool') with input shapes: [?,1,112,128].

At this point I think I am giving up on lesson 1. It’s just an exercise in frustration. I hope that lesson 2 will fare better.

1 Like

Hi @gramster,
The point is that the Lesson1 and I guess the others lessons are targeted to use theano and so, when you say you are switching to TensorFlow, you need take in mind that you also need change the configuration files that setup to use theano. For instance, the file in .theanorc you need change the references from theano to TensorFlow. Otherwise, you will face some runtime errors during execution of the lesson.

1 Like

Yes, I did that. I solved the negative dimension problem by leaving the image_dim_ordering option as ‘th’, not ‘tf’. I had to update Keras too to 1.1.1. So in the end I am on CUDA 8.0.61, tensorflow-gpu 1.0.1, and nVidia driver 375.39, on Ubuntu 16.04. Finally a combination that works!

1 Like

Thanks JRicon, this works for me.