Fastaiv2 and GPU not utilized

I had a challenge getting the v2 code to use the GPU. No matter how I tried to install v2, the notebook I was using would take forever running learn.lr_find(). I run an Ubuntu 18 server with a 2080 GTX GPU. I’ve been using this system for over a year and had previously used a 1050.

nvidia-smi showed that the GPU wasn’t being utilized. No memory usage and no Volatile GPU Usage.

The breakthrough clue, for me, was running fastai_dev/dev/examples/camvid.ipynb.
torch.cuda.set_device(0)
It failed right out of the box with and exception identifying an out of date driver problem.

back to nvidia-smi showed:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |
|-------------------------------±---------------------±---------------------+

Fastai_v1 was happy with 410 but the torch used by v2 didn’t like the version.

I tried several techniques to try to upgrade the driver, unsuccessfully.

The successful approach was from, https://www.mvps.net/docs/install-nvidia-drivers-ubuntu-18-04-lts-bionic-beaver-linux/

sudo apt-get purge nvidia*
sudo add-apt-repository ppa:graphics-drivers
sudo apt-get update
sudo apt-get install nvidia-430
sudo reboot

nvidia-smi showed

±----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+

torch.cuda.set_device(0)
was slow to return but it did succeed.

2 Likes

I had a similar problems, and simply created a utility function to remind me before training:

# export
import torch

def is_cuda_available():
    assert (
        torch.cuda.is_available()
    ), "PyTorch could not find cuda, you may have been suspended which breaks cuda support in the     environment."
    print("PyTorch found cuda.")
    return True

I realize the original post was from last year, and drivers improve all the time, but I still like to run this before any training to avoid wasting time.

Currently on 435 which seems fine.
±----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+