Unofficial Setup thread (Local, AWS)

i have a local rig, Started with GTX 1080 about a year and a half ago and recently ungraded to GTX 1080 TI. Decided to stick with Ubuntu 16.04 - the rest (nvidia drivers/ CUDA etc) was the latest
my observations - troubleshooting by googling is becoming harder and harder - you can google for your Python question and find a decent answer but that won’t work for your CUDA/Ubuntu whatever - i mean i coded my compilers/protocols in the cubicles of Bell Labs and still have trouble to parse the answers on askubuntu and such

2 Likes

My bad, I misunderstood “nvidia-smi” as a return value.

2 Likes

I was following the instructions in the OP on a fresh ubuntu 18.04 LTS install and all was good until:

python -c 'import torch; print(torch.cuda.device_count()); '

Which returned 0, I also tried

python -c 'import torch; print(torch.cuda.is_available()); '

Which returned “False”. I tried a bunch of things to no avail, until I saw the post by “saltybald” in this thread:

I took his advice and installed the 396 nvidia drivers with sudo apt install nvidia-driver-396 (having previously installed the 390 drivers, as per the instructions), did a restart and suddenly things started working.

2 Likes

The only thing I’m worried about is installing Linux on my laptop. I’d been a Linux user for about 10 years (with a dual-booted Windows XP running as a backup, but it was rarely used.) That was before the UEFI/secure boot thing came about; my setup was simple enough that I didn’t have to worry about breaking anything or having driver issues. Now, however, I have a laptop for which I shelled out a lot. I’m not even sure if I can send this model (XPS 9560) for repairs where I live if something were to break.

When I bought it I had all it planned out: wait for first LTS point release to show up, read reviews and install sometime in mid-July if all was good. By the time June rolled around, I’d become too scared (and possibly a bit lazy) to tamper with the laptop. :confused: I have Ubuntu running in a VM, but that won’t be any good.

Hi everyone!

This advice is for anyone that has trouble with Nvidia driver working. If provided steps work great - look no further.

Over the years, I have personally found the greatest success with using the “run” file installation method with direct downloads from Nvidia. This is both for display driver and CUDA.

The link for 396.24 driver run file is this:
https://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/396.24/NVIDIA-Linux-x86_64-396.24.run&lang=us&type=TITAN

After downloading you can do:

chmod +x NVIDIA-Linux-x86_64-396.24.run
sudo ./NVIDIA-Linux-x86_64-396.24.run

If you don’t find success with any other method, you might want to give that a try. I found I had to do this because the “Additional Drivers” GUI only showed 390 (which I already had), but not 396. I did not try apt-get nvidia-396 because again, for me, I have found the “run” method more reliable.

For CUDA you don’t need to do anything given pytorch install takes care of it for you. The only reason to install CUDA would be if you want it for other reasons like Tensorflow.

1 Like

Ubuntu in a VM would be too slow and of no use anyways.

You could make up a backup of windows and then try dual boot installation, check the above wiki for instructions.

If not, go ahead with a cloud solution.
Salamander.ai is a great option, @ashtonsix (Creator of salamander) has created a thread and actively hangs out in the forums to help any one facing issues.

Thanks for the recommendation. I’m sort of leaning towards trying a dual boot option after all, using this course as an excuse. :smiley:

1 Like

And now ‘conda install tensorflow-gpu’ will take care of CUDA and cuDNN so that headache can be avoided as well.

-edit

Sorry, that was confusing. For purposes of this class (and this thread), Pytorch has already handled CUDA and nothing needs to be done. keijik mentioned installing CUDA for Tensorflow, and I was responding that the conda install of Tensorflow now takes care of that in the same way that pytorch has for some time.

Pytorch install already handles that for you. Tensorflow uses separate libraries.

3 Likes

I’ve added a little note in the wiki to avoid any confusions.

@init_27 I have followed your steps and got till here(please refer to the attached image). Am I good to go? What else do I have to do?

Also when i run the command below should i get 0 or >0

python -c 'import torch; print(torch.cuda.device_count()); ’

Looking forward to your response, thanks!

It’s not ok to go with torch gpu count = 0 and cuda available = False. I had the same issue. It seems, you have to install nvidia-driver 396.

This helped me:
sudo add-apt-repository ppa:graphics-drivers/ppa - here you add repository for nvidia drivers
sudo apt install nvidia-396 - here you install needed packages

and reboot

1 Like

Now?

Also these two commands returned true

torch.cuda.is_available()
torch.backends.cudnn.enabled

Now warning message tells that GPU is too old and Pytorch does not support it. I don’t know if your card will work with torch v1. You should probably try some tutorial from pytorch site to make sure if it works or not.
https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#cuda-tensors

@hasib_zunair GT 750M Might not take you very far, even if you can compile from source.

The better alternative is to use a cloud service. Please check the salamander thread by @ashtonsix

I had tried using my old GT 740M but I’d recommend simply using a cloud service instead of going through the pain of setting it up.

:frowning:

okay, will look into it, thank you for the heads up.

can i install conda with 3.6 and not 3.7? will the other packages work properly?

I’ve successfully installed CUDA 410 on ubuntu 16.04 :
NVIDIA-SMI 410.48 Driver Version: 410.48

It seems to work properly:

import torch; print(torch.cuda.device_count());

1

import fastai; print(fastai.version)

1.0.6.dev0

2 Likes
python -c 'import torch; print(torch.cuda.device_count()); '

For the above command I am getting count = 0 should I proceed with the installation ? Conda has successfully installed cuda92 and I have my drivers up and running which are nvidia-384.130.
Is it ok to use nvidia-384.120 drivers ? I have nvidia 1050.