This advice is for anyone that has trouble with Nvidia driver working. If provided steps work great - look no further.
Over the years, I have personally found the greatest success with using the “run” file installation method with direct downloads from Nvidia. This is both for display driver and CUDA.
If you don’t find success with any other method, you might want to give that a try. I found I had to do this because the “Additional Drivers” GUI only showed 390 (which I already had), but not 396. I did not try apt-get nvidia-396 because again, for me, I have found the “run” method more reliable.
For CUDA you don’t need to do anything given pytorch install takes care of it for you. The only reason to install CUDA would be if you want it for other reasons like Tensorflow.
Ubuntu in a VM would be too slow and of no use anyways.
You could make up a backup of windows and then try dual boot installation, check the above wiki for instructions.
If not, go ahead with a cloud solution.
Salamander.ai is a great option, @ashtonsix (Creator of salamander) has created a thread and actively hangs out in the forums to help any one facing issues.
And now ‘conda install tensorflow-gpu’ will take care of CUDA and cuDNN so that headache can be avoided as well.
-edit
Sorry, that was confusing. For purposes of this class (and this thread), Pytorch has already handled CUDA and nothing needs to be done. keijik mentioned installing CUDA for Tensorflow, and I was responding that the conda install of Tensorflow now takes care of that in the same way that pytorch has for some time.
It’s not ok to go with torch gpu count = 0 and cuda available = False. I had the same issue. It seems, you have to install nvidia-driver 396.
This helped me:
sudo add-apt-repository ppa:graphics-drivers/ppa - here you add repository for nvidia drivers
sudo apt install nvidia-396 - here you install needed packages
For the above command I am getting count = 0 should I proceed with the installation ? Conda has successfully installed cuda92 and I have my drivers up and running which are nvidia-384.130.
Is it ok to use nvidia-384.120 drivers ? I have nvidia 1050.
In general I personally prefer to use NEWEST_VERSION -1 of system libraries or drivers as it quite often happen that full support of newest version for different libraries is not available or is in BETA unless specifically you know you need new future included in newest version. I have just created environment
on Ubuntu 16.04 LTS
Cuda=9.0
Anaconda 3.6
Nvidia-driver 384
Torch = 0.4.1
And tried to run dog_VS_cats from fast.ai 1.0.6 and seems to be working without any problems
I was able to install pytorch-nightly pytorch with Nvidia 384.130 drivers. I had cuda 9.0 preinstalled In my system so I just ran this command conda -c install pytorch pytorch-nightly and resolved the error regarding torch.cuda.device_count() . I found out that specific Nvidia drivers support a specific cuda version, I found this table to be helpful which I found on stack overflow.
CUDA 10.0: 410.48
CUDA 9.2: 396.xx
CUDA 9.1: 390.xx (update)
CUDA 9.0: 384.xx
CUDA 8.0 375.xx (GA2)
CUDA 8.0: 367.4x
CUDA 7.5: 352.xx
CUDA 7.0: 346.xx
CUDA 6.5: 340.xx
CUDA 6.0: 331.xx
CUDA 5.5: 319.xx
CUDA 5.0: 304.xx
CUDA 4.2: 295.41
CUDA 4.1: 285.05.33
CUDA 4.0: 270.41.19
CUDA 3.2: 260.19.26
CUDA 3.1: 256.40
CUDA 3.0: 195.36.15
I ran the commands in this thread with python 3.7, had no issue. But, I had a this async problem while using tensorflow. Found that it is not directly compatible. Made a different virtual environment for it.