Unofficial Setup thread (Local, AWS)

Hi everyone!

This advice is for anyone that has trouble with Nvidia driver working. If provided steps work great - look no further.

Over the years, I have personally found the greatest success with using the “run” file installation method with direct downloads from Nvidia. This is both for display driver and CUDA.

The link for 396.24 driver run file is this:
https://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/396.24/NVIDIA-Linux-x86_64-396.24.run&lang=us&type=TITAN

After downloading you can do:

chmod +x NVIDIA-Linux-x86_64-396.24.run
sudo ./NVIDIA-Linux-x86_64-396.24.run

If you don’t find success with any other method, you might want to give that a try. I found I had to do this because the “Additional Drivers” GUI only showed 390 (which I already had), but not 396. I did not try apt-get nvidia-396 because again, for me, I have found the “run” method more reliable.

For CUDA you don’t need to do anything given pytorch install takes care of it for you. The only reason to install CUDA would be if you want it for other reasons like Tensorflow.

1 Like

Ubuntu in a VM would be too slow and of no use anyways.

You could make up a backup of windows and then try dual boot installation, check the above wiki for instructions.

If not, go ahead with a cloud solution.
Salamander.ai is a great option, @ashtonsix (Creator of salamander) has created a thread and actively hangs out in the forums to help any one facing issues.

Thanks for the recommendation. I’m sort of leaning towards trying a dual boot option after all, using this course as an excuse. :smiley:

1 Like

And now ‘conda install tensorflow-gpu’ will take care of CUDA and cuDNN so that headache can be avoided as well.

-edit

Sorry, that was confusing. For purposes of this class (and this thread), Pytorch has already handled CUDA and nothing needs to be done. keijik mentioned installing CUDA for Tensorflow, and I was responding that the conda install of Tensorflow now takes care of that in the same way that pytorch has for some time.

Pytorch install already handles that for you. Tensorflow uses separate libraries.

3 Likes

I’ve added a little note in the wiki to avoid any confusions.

@init_27 I have followed your steps and got till here(please refer to the attached image). Am I good to go? What else do I have to do?

Also when i run the command below should i get 0 or >0

python -c 'import torch; print(torch.cuda.device_count()); ’

Looking forward to your response, thanks!

It’s not ok to go with torch gpu count = 0 and cuda available = False. I had the same issue. It seems, you have to install nvidia-driver 396.

This helped me:
sudo add-apt-repository ppa:graphics-drivers/ppa - here you add repository for nvidia drivers
sudo apt install nvidia-396 - here you install needed packages

and reboot

1 Like

Now?

Also these two commands returned true

torch.cuda.is_available()
torch.backends.cudnn.enabled

Now warning message tells that GPU is too old and Pytorch does not support it. I don’t know if your card will work with torch v1. You should probably try some tutorial from pytorch site to make sure if it works or not.
https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#cuda-tensors

@hasib_zunair GT 750M Might not take you very far, even if you can compile from source.

The better alternative is to use a cloud service. Please check the salamander thread by @ashtonsix

I had tried using my old GT 740M but I’d recommend simply using a cloud service instead of going through the pain of setting it up.

:frowning:

okay, will look into it, thank you for the heads up.

can i install conda with 3.6 and not 3.7? will the other packages work properly?

I’ve successfully installed CUDA 410 on ubuntu 16.04 :
NVIDIA-SMI 410.48 Driver Version: 410.48

It seems to work properly:

import torch; print(torch.cuda.device_count());

1

import fastai; print(fastai.version)

1.0.6.dev0

2 Likes
python -c 'import torch; print(torch.cuda.device_count()); '

For the above command I am getting count = 0 should I proceed with the installation ? Conda has successfully installed cuda92 and I have my drivers up and running which are nvidia-384.130.
Is it ok to use nvidia-384.120 drivers ? I have nvidia 1050.

Hello Hasib,

If I run Python 3.7 I run into the following error. You should use 3.6 instead.

if cuda: a = to_gpu(a, async=True)
^
SyntaxError: invalid syntax

For more information see this thread.

Hi,

In general I personally prefer to use NEWEST_VERSION -1 of system libraries or drivers as it quite often happen that full support of newest version for different libraries is not available or is in BETA unless specifically you know you need new future included in newest version. I have just created environment
on Ubuntu 16.04 LTS
Cuda=9.0
Anaconda 3.6
Nvidia-driver 384
Torch = 0.4.1

And tried to run dog_VS_cats from fast.ai 1.0.6 and seems to be working without any problems :slight_smile:

Cheers

Michal

3 Likes

I was able to install pytorch-nightly pytorch with Nvidia 384.130 drivers. I had cuda 9.0 preinstalled In my system so I just ran this command conda -c install pytorch pytorch-nightly and resolved the error regarding torch.cuda.device_count() . I found out that specific Nvidia drivers support a specific cuda version, I found this table to be helpful which I found on stack overflow.

CUDA 10.0: 410.48
CUDA  9.2: 396.xx
CUDA  9.1: 390.xx (update)
CUDA  9.0: 384.xx
CUDA  8.0  375.xx (GA2)
CUDA  8.0: 367.4x
CUDA  7.5: 352.xx
CUDA  7.0: 346.xx
CUDA  6.5: 340.xx
CUDA  6.0: 331.xx
CUDA  5.5: 319.xx
CUDA  5.0: 304.xx
CUDA  4.2: 295.41
CUDA  4.1: 285.05.33
CUDA  4.0: 270.41.19
CUDA  3.2: 260.19.26
CUDA  3.1: 256.40
CUDA  3.0: 195.36.15
5 Likes

I ran the commands in this thread with python 3.7, had no issue. But, I had a this async problem while using tensorflow. Found that it is not directly compatible. Made a different virtual environment for it.