Trouble with tensorflow-gpu==1.4.0 working with nvidia drivers v.384


(rkj) #1

I have installed cuda 8.0 successfully, got my GPU enabled and in persistence mode. Installed tensorflow gpu version 1.4.0. However when I try to load tensorflow gpu in the python shell I get the following error:

Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libnvidia-fatbinaryloader.so.387.26: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import *
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 72, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libnvidia-fatbinaryloader.so.387.26: cannot open shared object file: No such file or directory

Have any one encountered this issue, and perhaps know why its looking for nvidia drivers version 387 when I have 384 installed. I looked through wikipedia I have an GTX 980M which seems to be compatible with cuda 8. Not sure what I could be missing.
Thank you so much in advance :slight_smile:


(Martin) #2

Is 387 not compatible with your system? Perhaps upgrading to it fixes the issue.


(rkj) #3

I installed 387, however I am unable to install cuda 8.0, I downloaded the .deb and got the same error I got before:

Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 cuda : Depends: cuda-8-0 (>= 8.0.61) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

When I install cuda via sudo apt get I get nvidia drivers 384. I am following the instructions:


(Martin) #4

https://askubuntu.com/questions/598607/package-dependency-problem-while-installing-cuda-on-ubuntu-14-04 Its probably the same for you except with cuda-8-0.


(Martin) #5

There is a comment for the top answer "I have removed nvidia-opencl-icd-* and now I was able to install cuda by aptitude. Thanks guys! " Maybe you should try that too… after trying what the answer says of course.


(rkj) #6

Thank you so much.
I ended up removing all the drivers, removed cuda 8 and tensorflow 1.4.0.

I installed nvidia drivers 396, along with cuda 9.0 from the run file and followed the following instructions for cudnn 7.1

2.3.1. Installing from a Tar File

    Navigate to your <cudnnpath> directory containing the cuDNN Tar file.
    Unzip the cuDNN package.

    $ tar -xzvf cudnn-9.0-linux-x64-v7.tgz

    Copy the following files into the CUDA Toolkit directory.

    $ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
    $ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
    $ sudo chmod a+r /usr/local/cuda/include/cudnn.h
    /usr/local/cuda/lib64/libcudnn*

I think I didn’t have the headers in the right places. You can find those instructions here:
https://docs.nvidia.com/deeplearning/sdk/cudnn-install/

I was able to install tensorflow gpu 1.8.0, and validated:

2018-05-14 14:16:06.652316: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-05-14 14:16:06.842360: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-14 14:16:06.842881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GTX 980M major: 5 minor: 2 memoryClockRate(GHz): 1.1265
pciBusID: 0000:01:00.0
totalMemory: 7.94GiB freeMemory: 6.85GiB
2018-05-14 14:16:06.842901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-14 14:16:07.065356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-14 14:16:07.065397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-05-14 14:16:07.065417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-05-14 14:16:07.065628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6608 MB memory) -> physical GPU (device: 0, name: GeForce GTX 980M, pci bus id: 0000:01:00.0, compute capability: 5.2)

(rkj) #7

Can also see my GPU working with the network I am training :smiley:

Mon May 14 14:38:07 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.24                 Driver Version: 396.24                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980M    On   | 00000000:01:00.0 Off |                  N/A |
| N/A   76C    P0    73W /  N/A |   2110MiB /  8129MiB |     66%      Default |
+-------------------------------+----------------------+----------------------+
                                                                              
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2602      G   /usr/lib/xorg/Xorg                            86MiB |
|    0      2948      G   /usr/lib/xorg/Xorg                           794MiB |
|    0      3455      G   compiz                                       329MiB |
|    0      5882      G   ...-token=8DF1686BFB206F46CC87A76781B0CEE9   115MiB |
|    0      6768      C   /usr/bin/python3                             744MiB |
+-----------------------------------------------------------------------------+