Reset GPU without restarting linux?

I setup this computer to use remotely. In some instances CUDA errors (maybe related to network issues, I can’t tell) left the GPU useless. Killing the jupyter kernel didn’t help, only a computer restart.

This is the only GPU in the system (1070ti), so I believe it’s in use by the display. I am not running xwindows or similar.

nvidia-smi reset doesn’t seem to help either:

tbatchelli@MLrig:~$ nvidia-smi -r
GPU Reset couldn't run because GPU 00000000:23:00.0 is the primary GPU.

Is there any way to reset the GPU without restarting linux? Would adding a second (cheaper) GPU for display help?

What’s the underlying OS - Ubuntu? Can you see what processes are running nvidia-smi?

Then you can kill by sudo kill -9 <pid>. More details in this quora - https://www.quora.com/How-do-I-kill-all-the-computer-processes-shown-in-nvidia-smi

Sorry, yes, it’s Ubuntu 16.04 LTS.

It happens without having any process:

tbatchelli@MLrig:~$ nvidia-smi
Wed Mar 28 11:44:09 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.25                 Driver Version: 390.25                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 107...  On   | 00000000:23:00.0 Off |                  N/A |
|  0%   30C    P8     9W / 180W |      1MiB /  8116MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
tbatchelli@MLrig:~$ sudo nvidia-smi -r
GPU Reset couldn't run because GPU 00000000:23:00.0 is the primary GPU.
tbatchelli@MLrig:~$

So I wonder if this is because the computer screen is connected to the video card. I could consider buying a much cheaper card just to connect the screen to it…

Your Intel CPU should have an integrated GPU, so you should find there’s another video port on your motherboard that would let you use that.

Using the GPU that’s driving your display to do deep learning makes things much slower in my experience.

1 Like

It’s not Intel :frowning: It’s a AMD Ryzen which happens to not have an integrated GPU. But I guess getting another video card should work…

Did it work after adding another video card? I faced the same problem, and hence I bought a cheap videocard for my monitor and plugged the monitor into that, making my main GPU free. Even after this sudo nvidia-smi -r gives the same error (GPU Reset couldn’t run because GPU is the primary GPU)

I usually reload the module when I get such errors(for me it usually happens after waking up the linux system from sleep/suspend). The command works if there are no other processes using the gpu. If there are any processes using the GPU, kill those processes before invoking this command.

alias gpureload="sudo rmmod nvidia_uvm ; sudo modprobe nvidia_uvm"
11 Likes

This answer should be on top of google.
Cuda has a problem everytime after ubuntu wake up, and my previous solution was restarting only.
Now you save our time. Thanks.

On my Kubuntu 19 I managed to install NVIDIA-SMI 440.64 and it hangs every 24h, I have to usually do a combination of sudo /usr/share/sddm/scripts/Xsetup with killall plasmashell && plasmashell > /dev/null 2>&1 & disown and sudo rmmod nvidia_uvm ; sudo modprobe nvidia_uvm (as mentioned by @sgowda ) makes the trick and I don’t have to restart.

Thank you so much. You really save my life :slight_smile: