Cuda out of memory for Training language model

Ash_reg · May 17, 2023, 4:17am

Hey all, I was implementing the notebook in lesson 10 of the fastbook, where we train a language model and implement the process of ULMfit.

while running the code for fine-tuning the language model

learn.fit_one_cycle(1,2e-2)

I receive an output like this

OutOfMemoryError: CUDA out of memory. Tried to allocate 1.15 GiB (GPU 0; 5.80 GiB total capacity; 4.05 GiB already allocated; 561.00 MiB free; 4.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

what should I do and also how to avoid these errors in future, as I would be glad if there are steps to perform or also resources which can be read to gain a deeper understanding

VDM · May 17, 2023, 7:52am

This is an error happening when the GPU memory is not sufficient (and you have only 6GB here). I do not know your code, however the first attempt you can do is to reduce the batch size. Alternatively, switch to Colab where you may have some more GPU memory.

anirudh15 · May 18, 2023, 4:44am

Here are a few stuffs you can try:

Reduce batch size.
Use garbage collection to free up any allocated memory.
If using Jupyter, restart the kernel and run from top (so that anything clogging up the gpu memory is reset)
If running on local, try moving the flow to Colab or anything having access to bigger GPUs