Error using tensorflow w python 3, cudnn, getting dl box to work


(ben.bowles) #1

I got tensorflow working with python3 on my ubuntu GPU box and received the following error when I tried to use it:

Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5105 (compatibility version 5100).

SpecificallY:

(tensorenv) bbowles@bbowles-MS-7971:~$ python3 keras/examples/mnist_cnn.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
X_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn’t compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn’t compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn’t compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn’t compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn’t compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn’t compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.7845
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.42GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:390] Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5105 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
F tensorflow/core/kernels/conv

The command I used to install tensorflow was the following:
python3 -m pip install tensorflow_gpu-1.0.0-cp35-cp35m-linux_x86_64.whl

(I had locally downloaded the tensorflow wheel).

Does anyone have a recommendation on how to proceed? Should I try to compile tensorflow from scratch with bezel, or should I instead try to update cudann? (which I am nervous of doing, because it currently works now?)


(Constantin) #2

@ben.bowles, I am experiencing the same. I, too, have a GTX 1070 with a previously installed version of hand-compiled TF and cuDNN5.0
What above error means is that the version of TF you are installing expects cuDNN 5.0, while you have cuDNN 5.1 installed.
I am a little confused about the instructions provided by @jeremy in this thread: The Anaconda3 wheel installs python 3.6, while the pip install tensorflow-gpu installs a version with something something 3.5 in it - python 3.5?
Anyways, I solved it by downgrading to cuDNN 5.0. I have no idea what adverse effects this might have, but at least it runs.
I somehow hypothesize that in my case the confusion might be caused by pip/conda’s inability to cleanly uninstall my old TF installation, since it was hand-compiled (true??). I don’t know if this is true or how to remove cleanly.
Anyways, my workaround appears to work fine, if with a bad feeling about potential version mismatch.


(Jeremy Howard) #3

There are py3.6 versions listed here - https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md . Did the default tensorflow-gpu not install the 3.6 version for you?


(Constantin) #4

Wew, this was fast! Well, I tried to explicitely install the TF version cp36 using the TF_BINARY_URL enviro variable provided by tensorflow installation instructions. However, I get an error down the line of “this version is not supported on your system”. That’s why I went for the cp35 version of TF.


(Constantin) #5

Oh, I forgot to mention, I created an anaconda environment to leave the rest of the system untouched.


(Constantin) #6

@jeremy, I just double-checked: The github link you posted above is the one I had used to get the TF_BINARY_URL from earlier.
Retracing my steps I got:
´´´bash
export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.0.0-cp36-cp36m-linux_x86_64.whl
sudo -H pip3 install --upgrade $TF_BINARY_URL
tensorflow_gpu-1.0.0-cp36-cp36m-linux_x86_64.whl is not a supported wheel on this platform.
´´´


(ben.bowles) #7

Thanks @iNLyze

You managed to solve simply by replacing your cudann files with those for version 5? That sounds easy enough! If I recall its just a matter of putting those files I don’t understand in cuda folder in the right place.

I also didn’t see any mention of python version 3.6. Seems like 3.5 is the most tensorflow supports right now?


(ben.bowles) #8

But also, the instructions recommended using the virtualenv strategy. This way you have a bit more control over the python version and other things, so I did that with python v3.5
https://www.tensorflow.org/versions/r0.10/get_started/os_setup#optional_install_cuda_gpus_on_linux


(Jeremy Howard) #9

I haven’t tried in an anaconda environment - sounds like that might be confusing things somehow?


(Constantin) #10

Could be. If I have time I’ll try outside the virtualenv tomorrow.


(nicolewells) #11

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

It seems your Tensorflow version has been compiled for CUDA 9.0. Make sure you have this version of CUDA properly installed and referenced, or compile Tensorflow yourself to suit your environment (install doc).

Thanks