Getting started - Local System

guptamols · September 10, 2023, 5:10pm

After I have set up mamba, fastai, fastbook on my local system (WSL on Windows) that has 'NVIDIA GeForce MX450 and I run

from fastai.vision.all import *
path = untar_data(URLs.PETS)/'images'

def is_cat(x): return x[0].isupper()
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=42,
    label_func=is_cat, item_tfms=Resize(224))

learn = vision_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)

I get the error OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 2.00 GiB total capacity; 1.61 GiB already allocated; 0 bytes free; 1.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

In task manager, performance tab in windows i can see that I have GPU 0 and GPU 1. The GPU 1 is NVIDIA GeForce MX450 and has 10 GB memory while the error states that it is not able to allocate 26 MiB.

Please help me solve the issue.

Archaeologist · September 10, 2023, 8:21pm

Sure that the MX 450 has 10 GB? It says 2GB in your error and this is according to the spec I found.

Try reducing the batch size, e.g. test setting bs=10

guptamols · September 11, 2023, 8:59am

You are right !
Do you recommend using google colab ?

guptamols · September 11, 2023, 9:19am

I tried with batch size = 10 and it seems to be working now. How do you find the best batch size?

Archaeologist · September 11, 2023, 9:50am

Try larger sizes until it crashes, then go back to the last working bs

guptamols · October 14, 2023, 5:45pm

Hi @Archaeologist,

I am on chapter 4 " Getting started with NLP for absolute beginners" and as you suggested last time I tried with smaller number of batch sizes till bs=7 but I am still facing the error

CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 2.00 GiB total capacity; 1.45 GiB already allocated; 0 bytes free; 1.51 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The error doesnt seem to change.
What do you suggest ?

PS: After each trial I closed my environment and restarted everything.

AllenK · October 14, 2023, 10:56pm

It is never going to be much fun trying to run models on a 2Gb card, you are going to spend most of your time running into out of memory issues. Lowering batch size until it fits is one option. But using a free service like Colab or Kaggle will allow you more memory, larger batch sizes and most likely faster training than you can achieve on your 2Gb card, and with a lot less pain.

guptamols · October 15, 2023, 4:16am

Sure, Thanks for the quick response. Switching to colab for now