I’m trying to train a text classifier on an 8GB Nvidia RXT and I can’t seem to unfreeze all the layers and retrain.
I’ve looked at the OOM debugger and tried using
gpu_mem_restore_ctx to clean up any potential memory leaks but I’m still getting errors.
For more clarify I am using a language model encoder that I trained with a bs=32 (trained fined without any errors). Using that encoder, when I set my
classifier to a
bptt=20 I can unfreeze and train until about layer 3 (if I watch nvidia-smi every step I unfreeze at takes more and more memory) but after I
unfreeze() all layers I get and OOM error.
Are these parameters still too much for an 8GB card to handle? I have some pseudo code below to illustrate when the issue occurs.
learn.freeze_to(-1) learn.fit_one_cylce(**kwargs) # Trains fine learn.unfreeze() learn.fit_one_cylce(**kwargs) # Out of Memory on first epoch
learn.freeze_to(-1) learn.fit_one_cylce(1, **kwargs) # Trains fine learn.freeze_to(-2) learn.fit_one_cylce(1, **kwargs) # Trains with more CUDA memory usage learn.freeze_to(-3) learn.fit_one_cylce(1, **kwargs) # Trains with more and more CUDA memory usage learn.unfreeze() learn.fit_one_cylce(3, **kwargs) # Out of Memory on 1st epoch
I think I’m mostly confused as to why I can train my LM with a bs=32 using around 30K records but when trying to train this classifier with around 1500 records I can’t seem to get past unfreezing all layers.
Any thoughts would be appreciated.
As a side note, I have fastai installed in editable mode and have the latest changes pulled.