How to reclaim GPU memory in v2?

In v1 we have methods like learn.destroy and learn.purge to clean up GPU memory.

What should we be doing in v2 to release memory associated with a learner so we don’t have to restart our notebook?

Looking at the CamVid notebook it looks like learn.destroy() does exist :slight_smile:

https://github.com/fastai/fastai2/blob/master/nbs/course/lesson3-camvid.ipynb

1 Like

using 0.0.7 and at least the TextLearner doesn’t have a destroy method.

AttributeError: 'TextLearner' object has no attribute 'destroy'

I haven’t had time to work on those functions for now, so they are not implemented yet. They both had some drawbacks and bad side-effects in v1 so I want to take the time to do it properly in v2, probably at some point next month.

8 Likes

Not sure where this is at for v2., but yah, I noticed weird behavior in v.1. Seems like it would clear a bit of the GPU memory but never all of it … and after awhile, I would have to restart the notebook cuz it wouldn’t clear enough for me to continue.

Anyhow, looking forward to this eventually in v2. Is it something with Jupyter notebooks that makes this difficult?

Was this added yet? Still couldn’t find it in the codebase?

No, not yet

Should I look at the old code to get an idea how to do this or as stated above it is quite hacky and not ideal to use?

You can try but it was very ad hoc and not working well.

For me, here’s what I do:

learn = 0
gc.collect()
1 Like

learn = None
gc.collect()
torch.cuda.empty_cache()

This worked for me.

3 Likes

Does this really work, i.e. the GPU memory is cleared in nvidia-smi? I tried this, but my GPU memory usage did not go down.

On this recurring question of reclaiming CUDA GPU memory… I’ll give it a shot here.

Short Answer: The PyTorch team chooses not to provide an API to reset the the cuda context. Yes, it has been asked many times, and still is rejected: https://github.com/pytorch/pytorch/issues/20837

Their rationale is understandable:

“If you want to swing the big axe, pull the rug from under all tensors say in a live notebook session, be my guest do it at your own risk, you’ve been warned. And sorry, not from the PyTorch api.”. They have other technical reasons for not supporting this as well, such as complication to multiprocessing features. – don’t shoot the messenger here, if you don’t like this reality.

So “What is to be done?”

Slightly Long Answer

The gentle way when using PyTorch and/or fastai/fastai2 works to certain extent:

del learn
gc.collect()
torch.cuda.empty_cache()

But if the application has live objects (DataLoader/DataBlock/DataSet in fastai, for instance) in the notebook that hold on the some Torch tensors, sorry, at least the 600MB Torch CUDA context plus those tensors will not be freed.

The big axe — install other CUDA driver that does have this big reset switch: Numba, PyCUDA etc.

The following will work, but will throw the entire loaded torch library + PyTorch tensors + fastai DataLoader etc in chaos, accessing them will cause undefined behavior/error:

import numba.cuda as cu
dev = cu.select_device(0) # or which ever GPU ID
dev.reset()

then the whole “context” of the calling process will vanish from nvidia-smi output. Now you’re faced with a broken loaded torch module, who doesn’t know its context was destroyed by someone else (numba).

The 3rd way, restart the notebook kernel, and reload torch/fastai2 etc…

5 Likes

Inserting gc.collect() is a suggestion on pytorch issue 16417 (linked), but if someone has a minute please provide advice on precisely how I can apply this fix to the notebook camvid.ipynb from github fastai / fastai / nbs / examples ?

I have already tried to reduce batch size and tried to apply a silly low-resoution transform to see if I can get it to run (on line below the size is ridiculously small instead of the normal (360,480)
batch_tfms=[*aug_transforms(size=(36,48)), Normalize.from_stats(*imagenet_stats)])
)

I’m trying to allocate “20MB” and having CUDA out of memory error when I should have at least 2GB free on a 6GB VRAM GPU. Similar to the error on the pytorch issue.

the out of memory error occurs on the line
learn.lr_find()

(note: I am able to run the much smaller CAMVID_TINY example from https://github.com/fastai/fastbook/blob/master/01_intro.ipynb with no issue, just the camvid example notebook at the top with the larger dataset is throwing the error)