Error using tensorflow w python 3, cudnn, getting dl box to work

ben.bowles · February 28, 2017, 6:55pm

I got tensorflow working with python3 on my ubuntu GPU box and received the following error when I tried to use it:

Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5105 (compatibility version 5100).

SpecificallY:

(tensorenv) bbowles@bbowles-MS-7971:~$ python3 keras/examples/mnist_cnn.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
X_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn’t compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn’t compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn’t compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn’t compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn’t compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn’t compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.7845
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.42GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:390] Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5105 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
F tensorflow/core/kernels/conv

The command I used to install tensorflow was the following:
python3 -m pip install tensorflow_gpu-1.0.0-cp35-cp35m-linux_x86_64.whl

(I had locally downloaded the tensorflow wheel).

Does anyone have a recommendation on how to proceed? Should I try to compile tensorflow from scratch with bezel, or should I instead try to update cudann? (which I am nervous of doing, because it currently works now?)

iNLyze · February 28, 2017, 8:42pm

@ben.bowles, I am experiencing the same. I, too, have a GTX 1070 with a previously installed version of hand-compiled TF and cuDNN5.0
What above error means is that the version of TF you are installing expects cuDNN 5.0, while you have cuDNN 5.1 installed.
I am a little confused about the instructions provided by @jeremy in this thread: The Anaconda3 wheel installs python 3.6, while the pip install tensorflow-gpu installs a version with something something 3.5 in it - python 3.5?
Anyways, I solved it by downgrading to cuDNN 5.0. I have no idea what adverse effects this might have, but at least it runs.
I somehow hypothesize that in my case the confusion might be caused by pip/conda’s inability to cleanly uninstall my old TF installation, since it was hand-compiled (true??). I don’t know if this is true or how to remove cleanly.
Anyways, my workaround appears to work fine, if with a bad feeling about potential version mismatch.

jeremy · February 28, 2017, 8:44pm

There are py3.6 versions listed here - https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md . Did the default tensorflow-gpu not install the 3.6 version for you?

iNLyze · February 28, 2017, 8:46pm

Wew, this was fast! Well, I tried to explicitely install the TF version cp36 using the TF_BINARY_URL enviro variable provided by tensorflow installation instructions. However, I get an error down the line of “this version is not supported on your system”. That’s why I went for the cp35 version of TF.

iNLyze · February 28, 2017, 8:47pm

Oh, I forgot to mention, I created an anaconda environment to leave the rest of the system untouched.

iNLyze · February 28, 2017, 8:51pm

@jeremy, I just double-checked: The github link you posted above is the one I had used to get the TF_BINARY_URL from earlier.
Retracing my steps I got:
´´´bash
export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.0.0-cp36-cp36m-linux_x86_64.whl
sudo -H pip3 install --upgrade $TF_BINARY_URL
tensorflow_gpu-1.0.0-cp36-cp36m-linux_x86_64.whl is not a supported wheel on this platform.
´´´

ben.bowles · February 28, 2017, 9:15pm

Thanks @iNLyze

You managed to solve simply by replacing your cudann files with those for version 5? That sounds easy enough! If I recall its just a matter of putting those files I don’t understand in cuda folder in the right place.

I also didn’t see any mention of python version 3.6. Seems like 3.5 is the most tensorflow supports right now?

ben.bowles · February 28, 2017, 9:19pm

But also, the instructions recommended using the virtualenv strategy. This way you have a bit more control over the python version and other things, so I did that with python v3.5
https://www.tensorflow.org/versions/r0.10/get_started/os_setup#optional_install_cuda_gpus_on_linux

jeremy · February 28, 2017, 10:40pm

I haven’t tried in an anaconda environment - sounds like that might be confusing things somehow?

iNLyze · February 28, 2017, 10:43pm

Could be. If I have time I’ll try outside the virtualenv tomorrow.