I setup this computer to use remotely. In some instances CUDA errors (maybe related to network issues, I can’t tell) left the GPU useless. Killing the jupyter kernel didn’t help, only a computer restart.
This is the only GPU in the system (1070ti), so I believe it’s in use by the display. I am not running xwindows or similar.
nvidia-smi reset doesn’t seem to help either:
tbatchelli@MLrig:~$ nvidia-smi -r
GPU Reset couldn't run because GPU 00000000:23:00.0 is the primary GPU.
Is there any way to reset the GPU without restarting linux? Would adding a second (cheaper) GPU for display help?
tbatchelli@MLrig:~$ nvidia-smi
Wed Mar 28 11:44:09 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.25 Driver Version: 390.25 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 107... On | 00000000:23:00.0 Off | N/A |
| 0% 30C P8 9W / 180W | 1MiB / 8116MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
tbatchelli@MLrig:~$ sudo nvidia-smi -r
GPU Reset couldn't run because GPU 00000000:23:00.0 is the primary GPU.
tbatchelli@MLrig:~$
So I wonder if this is because the computer screen is connected to the video card. I could consider buying a much cheaper card just to connect the screen to it…
Did it work after adding another video card? I faced the same problem, and hence I bought a cheap videocard for my monitor and plugged the monitor into that, making my main GPU free. Even after this sudo nvidia-smi -r gives the same error (GPU Reset couldn’t run because GPU is the primary GPU)
I usually reload the module when I get such errors(for me it usually happens after waking up the linux system from sleep/suspend). The command works if there are no other processes using the gpu. If there are any processes using the GPU, kill those processes before invoking this command.
alias gpureload="sudo rmmod nvidia_uvm ; sudo modprobe nvidia_uvm"
This answer should be on top of google.
Cuda has a problem everytime after ubuntu wake up, and my previous solution was restarting only.
Now you save our time. Thanks.
On my Kubuntu 19 I managed to install NVIDIA-SMI 440.64 and it hangs every 24h, I have to usually do a combination of sudo /usr/share/sddm/scripts/Xsetup with killall plasmashell && plasmashell > /dev/null 2>&1 & disown and sudo rmmod nvidia_uvm ; sudo modprobe nvidia_uvm (as mentioned by @sgowda ) makes the trick and I don’t have to restart.