Torch.cuda.is_available() = false

I’ve been using Crestle for the past few weeks and it’s been working well until this past weekend when the GPU stopped working (maybe coincidence but I also updated to the latest fastAI over the weekend).

Symptoms…

torch.backends.cudnn.enabled = true
torch.cuda.is_available() = false
nvidia-smi --query-gpu=timestamp,pstate,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv -l 1 
     --> shows no GPU utilization
watch -n 1 nvidia-smi
     --> says... "No running processes found"

It takes a few hours for cats & dogs from lesson #1 to run when it should take about 3 minutes.

Crestle says I’m using a “n1-highmem-8/nvidia-tesla-k80”

I’ve restarted Juypter. I’ve also hunted around the forum. Others have had similar issues, but none on Crestle, and none of the suggestions so far have worked.

Any ideas? Other than that, fastAI has been awesome so far.

I had the same error, while using in GCP. It use to run on cpu after the update, after going through many things i solved it by uninstalling all the cuda toolkit versions and sticking on to cuda92. I think this is caused due to some version compatibility mismatch.

  • Check the GPU driver version
  • Check the cuda versions installed and keep only one of them installed ( if the driver version is 396.xx then cuda92, if its 410.x then cuda100). I recommend cuda92 as i solved using it, you can try with 410 and cuda100.
    Make sure by using conda list cuda
  • conda install pytorch torchvision cudatoolkit={9.0 for cuda92, 10.0 for cuda100} -c pytorch
  • conda install -c pytorch -c fastai fastai pytorch torchvision {cuda92 or cuda100}
2 Likes

@PrajwalPrashanth sweet, you solved my issue! Can’t thank you enough.

Turned out there was a mismatch between cuda92 and the cudatoolkit that you helped identify. Somehow I had the toolkit for cuda100 but had cuda92 installed (see below). Thanks again.

Name                    Version                   Build  Channel
cuda92                    1.0                           0    pytorch
cudatoolkit               10.0.130                      0

Just to add to this, I also had the wrong version of cuda for my GPU driver.

I found this Stack Overflow post with a list of drivers and associated cuda versions: https://stackoverflow.com/questions/30820513/what-is-the-correct-version-of-cuda-for-my-nvidia-driver/30820690#30820690

@dan_eiref Can you elaborate a little on how did you go ahead and solve this?
I am currently stuck here. cuda92 is supposed to be forward compatible.
could you tell me did you go ahead with cuda92 or cuda100 ?

Hi @barnacl - Sorry, I don’t remember the exact details of what I did, other than I followed @PrajwalPrashanth’s suggestion and found an inconsistency in versions so I resolved that inconsistency. Good luck!

Thanks. i managed to get it running on Paperspace. Not sure what the issue was.