CUDA lib not working after suspend on Ubuntu 16.04

Hi,

by any chance did anyone else spotted and solved a problem of CUDA library not working after suspending PC on Ubuntu 16.04?

Every time after suspending PC I got:

WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available  (error: Unable to get the number of gpus available: unknown error)

I use CUDA 8.0 with:

cuda-drivers                                                375.51-1 
libcuda1-375                                                375.66-0ubuntu0.16.04.1

I found one solution to enable this modes on nvidia:

/usr/bin/nvidia-smi -pm ENABLED
/usr/bin/nvidia-smi -c EXCLUSIVE_PROCESS

and put this into /etc/rc.local but this did not solved that issue. Maybe there is some workaround how to restart drivers without restarting whole PC which is annoying.

BTW. At the same time when I got this error message command nvidia-smi works perfectly and I can see GPU and display utilization with process running there.

2 Likes

I have this problem too. I haven’t found a real solution other than unloading and reloading the nvidia kernel module.

sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm

Note that to be able to unload the module any process using it will need to be terminated (if using a notebook just restart the kernel).

3 Likes

Thx! That is fantastic workaround. I allready posted on official nvidia support forum, maybe they will fix this bug soon.

Just for the record for the other people having the same problem I created following file:

-rwxr-xr-x 1 root root 162 cze 9 21:16 nvidia-reload

File /etc/pm/sleep.d/nvidia-reload:

#! /bin/sh
# Workaround for not working nvidia cuda after suspend
case $1 in
    resume|thaw)
	sudo rmmod nvidia_uvm
	sudo modprobe nvidia_uvm
    ;;
esac

Now this is called every time computer is booted from suspend.

If I suspend during the training of my module, using pm-suspend.
Hours later, when I wake up my server again, will the training continue ?
Or I need to re-train again ?

By the way… Had the same issue in Ubuntu 16.04 and the above script (thanks cypress!) and location had to be modified to

File /lib/systemd/system-sleep/nvidia-reload
-rwxr-xr-x 1 root root 238 Jun 17 06:13 nvidia-reload

case “$1” in
post)
sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm
;;
esac

Perhaps sudo is not needed in a system-wide script?