Unet learer Cuda error

I am getting this error

RuntimeError: CUDA out of memory. Tried to allocate 2.25 MiB (GPU 0; 7.93 GiB total capacity; 6.87 GiB already allocated; 15.06 MiB free; 7.32 MiB cached).

I have two GPUs so i know there is still some memory left.

Can it be that i have comment this line
torch.cuda.set_device(3)

If you only have 2 GPU, chances are that one of them is not #3. What ever nvidia-smi shows is what you want to use. So yes, comment it out, or change it to one of your cards.

I am confused here. Do I have only one GPU here if you look at this output. How can i change to another card because apparently, the 0 gGPU is already out of memory by the time i run learn.find

Are you using containers ?

No. I am not using containers.

that’s weird behaviour…Nvidia-smi just showing one GPU

Now I might be lost here. but if i get it correct. I have 15 gb memory. and by the time the resource monitor reaches 7.5, that is when it trows the CUDA runtime error.

Try to clean your GPU memory that is hold to the jupyter and try it again…

your_model=None
gc.collect()

Still the same error. I have had to reinstall the whole Ubuntu today to get rid of this cuda error. Atleast i thought i have corrupted something yesterday when i changed my desktop environment. Other models work fine like create-cnn.

Yeah … take a look on the drivers… also may be your GPU have bad power supply connection and it may get just shutdown

1 Like

Thanks. will check it.

1 Like

Just a flier but try restarting and running the thing as a .py script. Sometimes jupyter holds on the memory in the previous cells and floods the gpu.

1 Like

I believe that his problem is from the fact that the second GPU has gone missing