Setup problems: Running the Lesson 1 Notebook

As a follow up to my setup issues, I think this is the relevant subtopic to post it and here is the full error stack.

 vgg = Vgg16()
 # Grab a few images at a time for training and validation. 
 # NB: They must be in subdirectories named based on their category

 batches = vgg.get_batches(path+'train', batch_size=batch_size)
 val_batches = vgg.get_batches(path+'valid', batch_size=batch_size*2)
 vgg.finetune(batches), val_batches, nb_epoch=1)

###Output & Error message

Found 40 images belonging to 2 classes.

['nvcc', '-shared', '-O3', '-Xlinker', '-rpath,/usr/local/cuda/lib64', '-arch=sm_61', '-m64', '-Xcompiler', '-fno-math-errno,-Wno-unused-label,-Wno-unused-variable,-Wno-write-strings,-DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden', '-Xlinker', '-rpath,/home/ra/.theano/compiledir_Linux-4.8--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray', '-I/home/ra/.theano/compiledir_Linux-4.8--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray', '-I/usr/local/cuda/include', '-I/opt/anaconda/lib/python2.7/site-packages/theano/sandbox/cuda', '-I/opt/anaconda/lib/python2.7/site-packages/numpy/core/include', '-I/opt/anaconda/include/python2.7', '-I/opt/anaconda/lib/python2.7/site-packages/theano/gof', '-L/home/ra/.theano/compiledir_Linux-4.8--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray', '-L/opt/anaconda/lib', '-o', '/home/ra/.theano/compiledir_Linux-4.8--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/tmpbDHDIA/', '', '-lcudart', '-lcublas', '-lcuda_ndarray', '-lcudnn', '-lpython2.7']

Exception: ('The following error happened while compiling the node', GpuDnnConv{algo='small', inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode='valid', subsample=(1, 1), conv_mode='conv', precision='float32'}.0, Constant{1.0}, Constant{0.0}), '\n', 'nvcc return status', 2, 'for cmd', 'nvcc -shared -O3 -Xlinker -rpath,/usr/local/cuda/lib64 -arch=sm_61 -m64 -Xcompiler -fno-math-errno,-Wno-unused-label,-Wno-unused-variable,-Wno-write-strings,-DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden -Xlinker -rpath,/home/ra/.theano/compiledir_Linux-4.8--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray -I/home/ra/.theano/compiledir_Linux-4.8--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray -I/usr/local/cuda/include -I/opt/anaconda/lib/python2.7/site-packages/theano/sandbox/cuda -I/opt/anaconda/lib/python2.7/site-packages/numpy/core/include -I/opt/anaconda/include/python2.7 -I/opt/anaconda/lib/python2.7/site-packages/theano/gof -L/home/ra/.theano/compiledir_Linux-4.8--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray -L/opt/anaconda/lib -o /home/ra/.theano/compiledir_Linux-4.8--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/tmpbDHDIA/ -lcudart -lcublas -lcuda_ndarray -lcudnn -lpython2.7', "[GpuDnnConv{algo='small', inplace=True}(<CudaNdarrayType(float32, 4D)>, <CudaNdarrayType(float32, 4D)>, <CudaNdarrayType(float32, 4D)>, <CDataType{cudnnConvolutionDescriptor_t}>, Constant{1.0}, Constant{0.0})]")

End of Error/Exception

This is with python 2.7, keras 1.1.2

Don’t know how to fix this. Sorry for the repeated posts Need help @jeremy

Ok, it was something to do with cuDNN version. I re-installed it (downgraded) to 5.1 and things seem to running at least. Am happy it is finally running. I have been wanting to get to this point for many days now (ending up with erro and then troubleshooting which adds to the barrier a lot)

thanks for this fix - works perfectly


I am trying to get the setup to work on Windows with python3. (It seems tensorflow on windows only works with python3. )

I have created a conda environment and verified running python interactive prompt and tried importing tensorflow and print hello world message with a constant tensor did work fine.

However when i try to execute this line in lesson1’s notebook I get the following error. Any idea why?

[I 21:41:47.212 NotebookApp] Adapting to protocol v5.1 for kernel 6760032e-a017-417c-84c3-33c5f9737ba8
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\] Couldn’t open CUDA library cublas64_80.dll
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\] Unable to load cuBLAS DSO.
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\] Couldn’t open CUDA library cudnn64_5.dll
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\] Unable to load cuDNN DSO
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\] Couldn’t open CUDA library cufft64_80.dll
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\] Unable to load cuFFT DSO.
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\] successfully opened CUDA library nvcuda.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\] Couldn’t open CUDA library curand64_80.dll
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\] Unable to load cuRAND DSO.
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\] OpKernel (‘op: “BestSplits” device_type: “CPU”’) for unknown op: BestSplits
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\] OpKernel (‘op: “CountExtremelyRandomStats” device_type: “CPU”’) for unknown op: CountExtremelyRandomStats
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\] OpKernel (‘op: “FinishedNodes” device_type: “CPU”’) for unknown op: FinishedNodes
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\] OpKernel (‘op: “GrowTree” device_type: “CPU”’) for unknown op: GrowTree
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\] OpKernel (‘op: “ReinterpretStringToFloat” device_type: “CPU”’) for unknown op: ReinterpretStringToFloat
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\] OpKernel (‘op: “SampleInputs” device_type: “CPU”’) for unknown op: SampleInputs
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\] OpKernel (‘op: “ScatterAddNdim” device_type: “CPU”’) for unknown op: ScatterAddNdim
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\] OpKernel (‘op: “TopNInsert” device_type: “CPU”’) for unknown op: TopNInsert
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\] OpKernel (‘op: “TopNRemove” device_type: “CPU”’) for unknown op: TopNRemove
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\] OpKernel (‘op: “TreePredictions” device_type: “CPU”’) for unknown op: TreePredictions
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\] OpKernel (‘op: “UpdateFertileSlots” device_type: “CPU”’) for unknown op: UpdateFertileSlots
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\] Found device 0 with properties:
name: GeForce GTX 965M
major: 5 minor: 2 memoryClockRate (GHz) 1.15
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.64GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\] 0: Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 965M, pci bus id: 0000:01:00.0)
F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\] Check failed: s.ok() could not find cudnnCreate in cudnn DSO; dlerror: cudnnCreate not found

Thanks @chandanpanda. Worked well.

BTW, in case anyone anyone is attempting to this on CPU…

I’m running this on VMware player 16 w/ Ubuntu 16.04 and all the latest modules (as of 10/7/2017) including Keras 2.0 and theano 0.9. CPU is Intel i7-4700mq 2.4 ghz 4 cores, 8 core logical

1 epoch is taking 13,977s … about 4 hours.

Now you know why nvidia stock 3x’d in past few years.

Hello All,

I built a new desktop with a NVIDIA 1080TI graphics processor. I followed the instructions provided the following fastai forum to set up fastai to uset the local gpu. However, i noticed that upon running Lesson 1 code, only my CPU was being used. What additional steps may be required to engage my GPU? Suggestions or pointers to other discussions would help.


  • Ajit

