Show_install(0) Cuda Issues

KevinB · October 16, 2018, 2:15am

I was testing out the new show_install command and I noticed something weird in my output:

It looks like I have a different cuda version for torch and nvcc. The other thing I noticed is that the torch.cuda.is_available() returns False for me. I’m wondering if I need to upgrade my cuda version or I guess if anybody else has seen this where they have torch cuda 9.2 and nvcc cuda 9.1 and if that is an issue.

show_install(0)

platform   : Linux-4.15.0-36-generic-x86_64-with-debian-stretch-sid
distro     : Ubuntu 16.04 Xenial Xerus
python     : 3.6.6
fastai     : 1.0.6.dev0
torch      : 1.0.0.dev20181007
nvidia dr. : 390.67
torch cuda : Not available
torch cuda : 9.2.148
nvcc  cuda : 9.1.85
torch gpus

Edit: Here is a good write up from stas Fastai v1 install issues thread

Basically it doesn’t matter since pytorch has its own version of cuda wrapped inside it.

stas · October 16, 2018, 2:41am

I am going to remove the output of nvcc in show_install, since as system’s cuda version doesn’t have to match torch’s cuda version anymore - it’s just confusing.

The problem is the nvidia driver, for some reason torch can’t detect it. Most people had success by reinstalling it, in particular making sure that you don’t have more than one driver installed. So usually doing a full uninstall, and then installing the latest solid version that works for your card. The link you included goes into more details.

You don’t even need to have normal CUDA installed on your system unless you need it for other software (i.e. you only need pytorch w/ its pre-packaged cuda library)

KevinB · October 16, 2018, 2:50am

So would that actually be impacting being able to use my GPU? I hadn’t even noticed this was an issue. I guess I haven’t done a ton of training since I installed v1 though either.

stas · October 16, 2018, 2:56am

You shouldn’t be able to use torch w/ GPU if you get torch.cuda.is_available() == False. You can always run:

watch -n 1 nvidia-smi

and see whether the gpu is used when you run some notebook.

If it works despite it reporting not available, then something gets mixed up in your environment. For example you’re running the reporting script from a different environment than your jupyter (e.g. a shell w/o matching env activated or none activated at all).

which means that reporting conda env should be helpful too in show_install.

KevinB · October 16, 2018, 3:00am

That makes sense. I think that show_install is going to be super useful. I’m probably going to be putting that in my notebooks just as a verification. One thought, if you wanted to have a more detailed dump of their environment, you could do a conda list and output all the packages they have installed.

KevinB · October 16, 2018, 3:14am

Yeah, I’m not using my GPU. Thanks for helping me identify that. show install - 1 Kevin - 0

stas · October 16, 2018, 4:01am

Glad it was helpful.

I am not sure we want conda list by default, since its output is huge. Let’s see if it becomes needed then we will add it via an optional flag, like we do now with show_nvidia_smi option.

But I added python’s location and sys.path, I think those will be very useful when users report the nb can’t import fastai.

Give the updated version a try after git pull, also to check that I didn’t leave any holes in the updated code. Thanks.

KevinB · October 16, 2018, 4:14am

That looks great. Here is what it currently looks like for me:

=== Software === 
python version : 3.6.6
fastai version : 1.0.6.dev0
torch version  : 1.0.0.dev20181015
nvidia driver  : 390.87
torch cuda ver : 9.2.148
torch cuda is  : Not available

=== Hardware === 
No GPUs        

=== Environment === 
platform       : Linux-4.15.0-36-generic-x86_64-with-debian-stretch-sid
distro         : Ubuntu 16.04 Xenial Xerus
conda env      : fastai
python         : /home/kbird/anaconda3/envs/fastai/bin/python
sys.path       : 
/home/kbird/anaconda3/envs/fastai/lib/python36.zip
/home/kbird/anaconda3/envs/fastai/lib/python3.6
/home/kbird/anaconda3/envs/fastai/lib/python3.6/lib-dynload
/home/kbird/anaconda3/envs/fastai/lib/python3.6/site-packages
/home/kbird/anaconda3/envs/fastai/lib/python3.6/site-packages/IPython/extensions
/home/kbird/.ipython

I’m pretty sure the problem is that I’m using the 390 driver and I need to upgrade to 396. That’s what I’m trying at least.

Edit:

And here it is running how it is supposed to:

=== Software === 
python version : 3.6.6
fastai version : 1.0.6.dev0
torch version  : 1.0.0.dev20181015
nvidia driver  : 396.24
torch cuda ver : 9.2.148
torch cuda is  : available

=== Hardware === 
torch gpus     : 2
  gpu0         : 11175MB | GeForce GTX 1080 Ti
  gpu1         : 11178MB | GeForce GTX 1080 Ti

=== Environment === 
platform       : Linux-4.15.0-36-generic-x86_64-with-debian-stretch-sid
distro         : Ubuntu 16.04 Xenial Xerus
conda env      : fastai
python         : /home/kbird/anaconda3/envs/fastai/bin/python
sys.path       : 
/home/kbird/anaconda3/envs/fastai/lib/python36.zip
/home/kbird/anaconda3/envs/fastai/lib/python3.6
/home/kbird/anaconda3/envs/fastai/lib/python3.6/lib-dynload
/home/kbird/anaconda3/envs/fastai/lib/python3.6/site-packages
/home/kbird/anaconda3/envs/fastai/lib/python3.6/site-packages/IPython/extensions
/home/kbird/.ipython

stas · October 16, 2018, 4:48am

This is good, I committed yet another iteration to show how many GPUs nvidia sees and how many if any torch sees, otherwise ‘no gpus’ is a misleading statement.

So your edit report is from the fixed system, correct? got gpu? great!

KevinB · October 16, 2018, 4:53am

Here is the updated. One thing, does it make sense to have it as ```text? so it doesn’t add any coloring?

=== Software === 
python version  : 3.6.6
fastai version  : 1.0.6.dev0
torch version   : 1.0.0.dev20181015
nvidia driver   : 396.24
torch cuda ver  : 9.2.148
torch cuda is   : available

=== Hardware === 
nvidia gpus     : 2
torch available : 2
  - gpu0        : 11175MB | GeForce GTX 1080 Ti
  - gpu1        : 11178MB | GeForce GTX 1080 Ti

=== Environment === 
platform        : Linux-4.15.0-36-generic-x86_64-with-debian-stretch-sid
distro          : Ubuntu 16.04 Xenial Xerus
conda env       : fastai
python          : /home/kbird/anaconda3/envs/fastai/bin/python
sys.path        : 
/home/kbird/anaconda3/envs/fastai/lib/python36.zip
/home/kbird/anaconda3/envs/fastai/lib/python3.6
/home/kbird/anaconda3/envs/fastai/lib/python3.6/lib-dynload
/home/kbird/anaconda3/envs/fastai/lib/python3.6/site-packages
/home/kbird/anaconda3/envs/fastai/lib/python3.6/site-packages/IPython/extensions
/home/kbird/.ipython

Mon Oct 15 23:51:06 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.24                 Driver Version: 396.24                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   55C    P0    66W / 275W |    837MiB / 11175MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 25%   24C    P8     8W / 250W |     12MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     16151      G   /usr/lib/xorg/Xorg                           328MiB |
|    0     16659      G   compiz                                       152MiB |
|    0     17474      G   /usr/lib/firefox/firefox                       2MiB |
|    0     17493      G   /usr/lib/firefox/firefox                       2MiB |
|    0     21783      G   ...-token=F5BA0D607E0EB0796BF0EC178D3EFD3F   337MiB |
+-----------------------------------------------------------------------------+

vs with text

=== Software === 
python version  : 3.6.6
fastai version  : 1.0.6.dev0
torch version   : 1.0.0.dev20181015
nvidia driver   : 396.24
torch cuda ver  : 9.2.148
torch cuda is   : available

=== Hardware === 
nvidia gpus     : 2
torch available : 2
  - gpu0        : 11175MB | GeForce GTX 1080 Ti
  - gpu1        : 11178MB | GeForce GTX 1080 Ti

=== Environment === 
platform        : Linux-4.15.0-36-generic-x86_64-with-debian-stretch-sid
distro          : Ubuntu 16.04 Xenial Xerus
conda env       : fastai
python          : /home/kbird/anaconda3/envs/fastai/bin/python
sys.path        : 
/home/kbird/anaconda3/envs/fastai/lib/python36.zip
/home/kbird/anaconda3/envs/fastai/lib/python3.6
/home/kbird/anaconda3/envs/fastai/lib/python3.6/lib-dynload
/home/kbird/anaconda3/envs/fastai/lib/python3.6/site-packages
/home/kbird/anaconda3/envs/fastai/lib/python3.6/site-packages/IPython/extensions
/home/kbird/.ipython

Mon Oct 15 23:51:06 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.24                 Driver Version: 396.24                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   55C    P0    66W / 275W |    837MiB / 11175MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 25%   24C    P8     8W / 250W |     12MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     16151      G   /usr/lib/xorg/Xorg                           328MiB |
|    0     16659      G   compiz                                       152MiB |
|    0     17474      G   /usr/lib/firefox/firefox                       2MiB |
|    0     17493      G   /usr/lib/firefox/firefox                       2MiB |
|    0     21783      G   ...-token=F5BA0D607E0EB0796BF0EC178D3EFD3F   337MiB |
+-----------------------------------------------------------------------------+

stas · October 16, 2018, 5:03am

``` are there is to help users to properly format the code when they paste it in the forums, as some paste without formatting it. In theory I could indent the output 4 spaces, but if the previous text uses nested lists I think it might not format properly, and if some long lines hardwrap in copy-n-paste it’ll mess up the markdown too. So it feels like a safe simple approach. Of course, you don’t have to paste the formatting as suggested and re-format it anyway you want. But I’m open to suggestions if you think not using them is better for the general case.

edit: oh sorry, I didn’t see the coloring as it was hidden by the scroll bar, I can see what you mean now.

I see what you mean. so yes, ``` it is:

=== Software === 
python version  : 3.6.6
fastai version  : 1.0.6.dev0
torch version   : 1.0.0.dev20181013
nvidia driver   : 396.44
torch cuda ver  : 9.2.148
torch cuda is   : available

=== Hardware === 
nvidia gpus     : 1
torch available : 1
  - gpu0        : 8119MB | GeForce GTX 1070 Ti

=== Environment ===
platform        : Linux-4.15.0-36-generic-x86_64-with-debian-buster-sid
distro          : Ubuntu 18.04 Bionic Beaver
conda env       : pytorch-dev
python          : /home/stas/anaconda3/envs/pytorch-dev/bin/python
sys.path        :
/home/stas/anaconda3/envs/pytorch-dev/lib/python3.6/site-packages/_pdbpp_path_hack
/home/stas/anaconda3/envs/pytorch-dev/lib/python36.zip
/home/stas/anaconda3/envs/pytorch-dev/lib/python3.6
/home/stas/anaconda3/envs/pytorch-dev/lib/python3.6/lib-dynload
/home/stas/.local/lib/python3.6/site-packages
/home/stas/anaconda3/envs/pytorch-dev/lib/python3.6/site-packages
/mnt/disc1/fast.ai-1/br/fastai/master
/home/stas/anaconda3/envs/pytorch-dev/lib/python3.6/site-packages/IPython/extensions

Mon Oct 15 22:07:16 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44                 Driver Version: 396.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 107...  Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   37C    P8     6W / 180W |    495MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     29285      C   ...s/anaconda3/envs/pytorch-dev/bin/python   485MiB |
+-----------------------------------------------------------------------------+

KevinB · October 16, 2018, 5:08am

I like having the formatting. I just would maybe add ```text to the starting one since it isn’t really code and that formats it better, but just having that information in general is awesome.

stas · October 16, 2018, 5:11am

yes, sorry, I only saw what you meant when I pasted it myself I added ```text. Thank you for a great suggestion, @KevinB. Let’s hope people will copy-n-paste it too

jeremy · October 16, 2018, 7:27pm

Probably better to use:

nvidia-smi dmon

stas · October 16, 2018, 7:46pm

I didn’t know of that one, thank you. Excellent for watching memory consumption!

But you can’t see the processes there. So I suppose both are useful.

stas · October 16, 2018, 9:34pm

I made a summary of this discussion here: http://docs-dev.fast.ai/troubleshoot#am-i-using-my-gpus

suvash · October 16, 2018, 10:31pm

this might be useful too.

stas · October 17, 2018, 6:12am

This is great, Suvash! Keep those suggestions coming, I will be compiling them together at http://docs-dev.fast.ai/

stas · October 23, 2018, 6:52pm

I started a new document https://docs-dev.fast.ai/gpu.html to collect gpu-related tips, so if you have other suggestions please send them my way. Thanks.

suvash · October 23, 2018, 8:27pm

adding the utilization metrics (utilization.gpu,utilization.memory) is also a good idea.

nvidia-smi --query-gpu=timestamp,pstate,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv -l 5

and you can get more info by

nvidia-smi --help-query-gpu

More information available here.
https://nvidia.custhelp.com/app/answers/detail/a_id/3751/~/useful-nvidia-smi-queries