Unet learer Cuda error

shakur · December 13, 2018, 2:20pm

I am getting this error

RuntimeError: CUDA out of memory. Tried to allocate 2.25 MiB (GPU 0; 7.93 GiB total capacity; 6.87 GiB already allocated; 15.06 MiB free; 7.32 MiB cached).

I have two GPUs so i know there is still some memory left.

Can it be that i have comment this line
torch.cuda.set_device(3)

FourMoBro · December 13, 2018, 2:26pm

If you only have 2 GPU, chances are that one of them is not #3. What ever nvidia-smi shows is what you want to use. So yes, comment it out, or change it to one of your cards.

shakur · December 13, 2018, 2:33pm

I am confused here. Do I have only one GPU here if you look at this output. How can i change to another card because apparently, the 0 gGPU is already out of memory by the time i run learn.find

willismar · December 13, 2018, 2:35pm

Are you using containers ?

shakur · December 13, 2018, 2:35pm

No. I am not using containers.

willismar · December 13, 2018, 2:36pm

that’s weird behaviour…Nvidia-smi just showing one GPU

shakur · December 13, 2018, 2:40pm

Now I might be lost here. but if i get it correct. I have 15 gb memory. and by the time the resource monitor reaches 7.5, that is when it trows the CUDA runtime error.

willismar · December 13, 2018, 2:41pm

Try to clean your GPU memory that is hold to the jupyter and try it again…

your_model=None
gc.collect()

shakur · December 13, 2018, 2:45pm

Still the same error. I have had to reinstall the whole Ubuntu today to get rid of this cuda error. Atleast i thought i have corrupted something yesterday when i changed my desktop environment. Other models work fine like create-cnn.

willismar · December 13, 2018, 2:46pm

Yeah … take a look on the drivers… also may be your GPU have bad power supply connection and it may get just shutdown

shakur · December 13, 2018, 2:48pm

Thanks. will check it.

bluesky314 · December 13, 2018, 5:13pm

Just a flier but try restarting and running the thing as a .py script. Sometimes jupyter holds on the memory in the previous cells and floods the gpu.

willismar · December 14, 2018, 12:52am

I believe that his problem is from the fact that the second GPU has gone missing