Setup problems: Azure

I believe there’s a need for a separate topic similar to AWS one..

In my case, after fresh installation on Azure NC6 using actual version of setup script from here I get nvidia driver error as follows while running import theano in jupyter notebook session:

Exception: The nvidia driver version installed with this OS does not give good results for reduction.Installing the nvidia driver available on the same download page as the cuda package will fix the problem: http://developer.nvidia.com/cuda-downloads

I suppose this issue is related to some updates of one of the required components.
FYI, my nvidia-smi output looks okay and shows 384.81 version:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 0000E50E:00:00.0 Off |                    0 |
| N/A   55C    P0    56W / 149W |     93MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2477      C   /home/lysuhin/anaconda2/bin/python            82MiB |
+-----------------------------------------------------------------------------+

Thanks in advance for any help.

I solved this one by myself.
Apparently, the bottleneck is 384.xx nvidia driver which is installed along with cuda-9.0 from setup script.
I was able to get import theano to work by installing legacy driver 375.88 and cuda 8.0 / cudNN 5.1 (for cuda 8.0) afterwards. Important notes can be found here and here.
Don’t forget to remove completely previous versions of nvidia drivers and cuda!

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.88                 Driver Version: 375.88                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 4EFB:00:00.0     Off |                    0 |
| N/A   38C    P0    55W / 149W |      0MiB / 11439MiB |     70%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

1 Like

I ran into the same problem. I followed the instructions in the 2 links you provided, ran into various issues and couldn’t get my VM into a good state after a considerable amount of time. I then created a new VM from scratch and followed this Unable to uninstall cuda 9.0 completely and install 8.0 instead, and it worked like a charm. The [Azure setup script] only needs to change 1 line: sudo apt-get -y install cuda --> sudo apt-get -y install cuda-8-0. After the fix, NVidia-smi still reports 384.81 for me, so I think it’s the cuda version that matters.

1 Like

TNX for sharing!

1 Like

I made a PR on github according to your answer. Thanks for sharing!

To follow up on azuser’s answer, all you need to run is:

$ sudo apt-get remove cuda*
$ sudo apt-get -y install cuda-8-0