Unofficial Setup thread (Local, AWS)

keijik · October 13, 2018, 4:57pm

Hi everyone!

This advice is for anyone that has trouble with Nvidia driver working. If provided steps work great - look no further.

Over the years, I have personally found the greatest success with using the “run” file installation method with direct downloads from Nvidia. This is both for display driver and CUDA.

The link for 396.24 driver run file is this:
https://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/396.24/NVIDIA-Linux-x86_64-396.24.run&lang=us&type=TITAN

After downloading you can do:

chmod +x NVIDIA-Linux-x86_64-396.24.run
sudo ./NVIDIA-Linux-x86_64-396.24.run

If you don’t find success with any other method, you might want to give that a try. I found I had to do this because the “Additional Drivers” GUI only showed 390 (which I already had), but not 396. I did not try apt-get nvidia-396 because again, for me, I have found the “run” method more reliable.

For CUDA you don’t need to do anything given pytorch install takes care of it for you. The only reason to install CUDA would be if you want it for other reasons like Tensorflow.

init_27 · October 14, 2018, 3:52am

Ubuntu in a VM would be too slow and of no use anyways.

You could make up a backup of windows and then try dual boot installation, check the above wiki for instructions.

If not, go ahead with a cloud solution.
Salamander.ai is a great option, @ashtonsix (Creator of salamander) has created a thread and actively hangs out in the forums to help any one facing issues.

_venkat · October 14, 2018, 9:27am

Thanks for the recommendation. I’m sort of leaning towards trying a dual boot option after all, using this course as an excuse.

Ralph · October 14, 2018, 1:12pm

And now ‘conda install tensorflow-gpu’ will take care of CUDA and cuDNN so that headache can be avoided as well.

-edit

Sorry, that was confusing. For purposes of this class (and this thread), Pytorch has already handled CUDA and nothing needs to be done. keijik mentioned installing CUDA for Tensorflow, and I was responding that the conda install of Tensorflow now takes care of that in the same way that pytorch has for some time.

jeremy · October 14, 2018, 1:33pm

Pytorch install already handles that for you. Tensorflow uses separate libraries.

init_27 · October 14, 2018, 1:36pm

I’ve added a little note in the wiki to avoid any confusions.

hasib_zunair · October 14, 2018, 4:15pm

@init_27 I have followed your steps and got till here(please refer to the attached image). Am I good to go? What else do I have to do?

Also when i run the command below should i get 0 or >0

python -c 'import torch; print(torch.cuda.device_count()); ’

Looking forward to your response, thanks!

ademyanchuk · October 14, 2018, 4:36pm

It’s not ok to go with torch gpu count = 0 and cuda available = False. I had the same issue. It seems, you have to install nvidia-driver 396.

This helped me:
sudo add-apt-repository ppa:graphics-drivers/ppa - here you add repository for nvidia drivers
sudo apt install nvidia-396 - here you install needed packages

and reboot

hasib_zunair · October 14, 2018, 5:04pm

Now?

Also these two commands returned true

torch.cuda.is_available()
torch.backends.cudnn.enabled

ademyanchuk · October 15, 2018, 2:48am

Now warning message tells that GPU is too old and Pytorch does not support it. I don’t know if your card will work with torch v1. You should probably try some tutorial from pytorch site to make sure if it works or not.
https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#cuda-tensors

init_27 · October 15, 2018, 3:17am

@hasib_zunair GT 750M Might not take you very far, even if you can compile from source.

The better alternative is to use a cloud service. Please check the salamander thread by @ashtonsix

I had tried using my old GT 740M but I’d recommend simply using a cloud service instead of going through the pain of setting it up.

hasib_zunair · October 15, 2018, 6:27am

hasib_zunair · October 15, 2018, 6:28am

okay, will look into it, thank you for the heads up.

hasib_zunair · October 15, 2018, 9:27am

can i install conda with 3.6 and not 3.7? will the other packages work properly?

ste · October 15, 2018, 10:08am

I’ve successfully installed CUDA 410 on ubuntu 16.04 :
NVIDIA-SMI 410.48 Driver Version: 410.48

It seems to work properly:

import torch; print(torch.cuda.device_count());

1

import fastai; print(fastai.version)

1.0.6.dev0

keyurparalkar · October 15, 2018, 3:26pm

python -c 'import torch; print(torch.cuda.device_count()); '

For the above command I am getting count = 0 should I proceed with the installation ? Conda has successfully installed cuda92 and I have my drivers up and running which are nvidia-384.130.
Is it ok to use nvidia-384.120 drivers ? I have nvidia 1050.

martijnd · October 15, 2018, 7:25pm

Hello Hasib,

If I run Python 3.7 I run into the following error. You should use 3.6 instead.

if cuda: a = to_gpu(a, async=True)
^
SyntaxError: invalid syntax

For more information see this thread.

Michal_w · October 15, 2018, 10:36pm

Hi,

In general I personally prefer to use NEWEST_VERSION -1 of system libraries or drivers as it quite often happen that full support of newest version for different libraries is not available or is in BETA unless specifically you know you need new future included in newest version. I have just created environment
on Ubuntu 16.04 LTS
Cuda=9.0
Anaconda 3.6
Nvidia-driver 384
Torch = 0.4.1

And tried to run dog_VS_cats from fast.ai 1.0.6 and seems to be working without any problems

Cheers

Michal

keyurparalkar · October 16, 2018, 4:57am

I was able to install pytorch-nightly pytorch with Nvidia 384.130 drivers. I had cuda 9.0 preinstalled In my system so I just ran this command conda -c install pytorch pytorch-nightly and resolved the error regarding torch.cuda.device_count() . I found out that specific Nvidia drivers support a specific cuda version, I found this table to be helpful which I found on stack overflow.

CUDA 10.0: 410.48
CUDA  9.2: 396.xx
CUDA  9.1: 390.xx (update)
CUDA  9.0: 384.xx
CUDA  8.0  375.xx (GA2)
CUDA  8.0: 367.4x
CUDA  7.5: 352.xx
CUDA  7.0: 346.xx
CUDA  6.5: 340.xx
CUDA  6.0: 331.xx
CUDA  5.5: 319.xx
CUDA  5.0: 304.xx
CUDA  4.2: 295.41
CUDA  4.1: 285.05.33
CUDA  4.0: 270.41.19
CUDA  3.2: 260.19.26
CUDA  3.1: 256.40
CUDA  3.0: 195.36.15

hasib_zunair · October 16, 2018, 5:52am

I ran the commands in this thread with python 3.7, had no issue. But, I had a this async problem while using tensorflow. Found that it is not directly compatible. Made a different virtual environment for it.