cypreess
(Kris Dorosz)
June 9, 2017, 6:58am
1
Hi,
by any chance did anyone else spotted and solved a problem of CUDA library not working after suspending PC on Ubuntu 16.04?
Every time after suspending PC I got:
WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available (error: Unable to get the number of gpus available: unknown error)
I use CUDA 8.0 with:
cuda-drivers 375.51-1
libcuda1-375 375.66-0ubuntu0.16.04.1
I found one solution to enable this modes on nvidia:
/usr/bin/nvidia-smi -pm ENABLED
/usr/bin/nvidia-smi -c EXCLUSIVE_PROCESS
and put this into /etc/rc.local
but this did not solved that issue. Maybe there is some workaround how to restart drivers without restarting whole PC which is annoying.
BTW. At the same time when I got this error message command nvidia-smi
works perfectly and I can see GPU and display utilization with process running there.
2 Likes
I have this problem too. I haven’t found a real solution other than unloading and reloading the nvidia kernel module.
sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm
Note that to be able to unload the module any process using it will need to be terminated (if using a notebook just restart the kernel).
4 Likes
cypreess
(Kris Dorosz)
June 9, 2017, 7:10pm
3
Thx! That is fantastic workaround. I allready posted on official nvidia support forum, maybe they will fix this bug soon.
cypreess
(Kris Dorosz)
June 9, 2017, 7:21pm
4
Just for the record for the other people having the same problem I created following file:
-rwxr-xr-x 1 root root 162 cze 9 21:16 nvidia-reload
File /etc/pm/sleep.d/nvidia-reload
:
#! /bin/sh
# Workaround for not working nvidia cuda after suspend
case $1 in
resume|thaw)
sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm
;;
esac
Now this is called every time computer is booted from suspend.
huangyingw
(Ying Huang)
September 4, 2017, 10:16pm
5
If I suspend during the training of my module, using pm-suspend.
Hours later, when I wake up my server again, will the training continue ?
Or I need to re-train again ?
namascar
(javier Gonzalez)
June 17, 2018, 1:24pm
6
By the way… Had the same issue in Ubuntu 16.04 and the above script (thanks cypress!) and location had to be modified to
File /lib/systemd/system-sleep/nvidia-reload
-rwxr-xr-x 1 root root 238 Jun 17 06:13 nvidia-reload
case “$1” in
post )
sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm
;;
esac
1 Like
cherrot
(cherrot)
February 1, 2020, 8:01am
7
Perhaps sudo
is not needed in a system-wide script?
Thanks for the script. This script is not working for on ubuntu 20.04. Is there any way around this?