CUDA error: Out of memory using the functions learner.lr_find() and learner.fit_one_cycle()

Mausoleoo · April 18, 2021, 12:47am

Hi everyone,

I have a problem with the memory of my GPU, when I try to use the functions lr_find() and fit_one_cycle()

RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 10.75 GiB total capacity; 9.16 GiB already allocated; 79.06 MiB free; 9.79 GiB reserved in total by PyTorch)

I am using images of 1024 x 1024 and a GeForce RTX 2080 Ti and fastai 2.3.0, I tried to do it with different batch size (128,64,32,16,8,4) even with batch size 1 and I keep having the same problem.

I already tried with these 3 solutions

gc.collect()
torch.cuda.empty_cache()

kill -9 PID

I tried with CUDA 11 and CUDA 10.2

(I also restart the kernel frequently)

and the result is the same. Please if anyone has experience with this, please help me.

Regards

ali_baba · April 18, 2021, 1:51am

1024x1024 is a relatively large image – what model architecture are you using?
Can you check what the memory allocation is right before calling the lr_find or fit_one_cycle?

Mausoleoo · April 18, 2021, 2:16am

Hi,

I am using a Resnet152 and in the image indicates the memory allocation.
I am not sure if I answered your question

Screenshot from 2021-04-17 21-12-10

Thank you!

BresNet · April 18, 2021, 12:21pm

ResNet152 is a huge model, and, as @ali_baba pointed out, 1024x1024 is a very large image size. If you cannot downscale, try smaller models such as ResNet18. With this, you should be able to get the model to run with small batch size.

However, a small batch size can bring problems during training, as batch norm will not work as well anymore. In my experience, starting with image sizes such as 1024 x 1024 usually gave worse results than smaller images. But you can try progressive resizing where you start with, let’s say 224 x 224 px and then slowly increase the image resolution while decreasing the batch size. Jeremy introduced this technique relatively early on in the course and it just recently has been used to achieve new SOTA results on ImageNet.

If you are still having troubles, mixed-precision can also help to save some memory.