How to reclaim GPU memory in v2?

wgpubs · January 29, 2020, 7:56pm

In v1 we have methods like learn.destroy and learn.purge to clean up GPU memory.

What should we be doing in v2 to release memory associated with a learner so we don’t have to restart our notebook?

muellerzr · January 29, 2020, 7:59pm

Looking at the CamVid notebook it looks like learn.destroy() does exist

https://github.com/fastai/fastai2/blob/master/nbs/course/lesson3-camvid.ipynb

wgpubs · January 29, 2020, 8:05pm

using 0.0.7 and at least the TextLearner doesn’t have a destroy method.

AttributeError: 'TextLearner' object has no attribute 'destroy'

sgugger · January 30, 2020, 12:48am

I haven’t had time to work on those functions for now, so they are not implemented yet. They both had some drawbacks and bad side-effects in v1 so I want to take the time to do it properly in v2, probably at some point next month.

wgpubs · February 20, 2020, 11:07pm

Not sure where this is at for v2., but yah, I noticed weird behavior in v.1. Seems like it would clear a bit of the GPU memory but never all of it … and after awhile, I would have to restart the notebook cuz it wouldn’t clear enough for me to continue.

Anyhow, looking forward to this eventually in v2. Is it something with Jupyter notebooks that makes this difficult?

jspence · March 23, 2020, 10:05am

Was this added yet? Still couldn’t find it in the codebase?

muellerzr · March 23, 2020, 3:04pm

No, not yet

jspence · March 23, 2020, 6:26pm

Should I look at the old code to get an idea how to do this or as stated above it is quite hacky and not ideal to use?

sgugger · March 23, 2020, 6:47pm

You can try but it was very ad hoc and not working well.

init_27 · March 23, 2020, 9:01pm

For me, here’s what I do:

learn = 0
gc.collect()

noel · March 23, 2020, 10:17pm

learn = None
gc.collect()
torch.cuda.empty_cache()

This worked for me.

ai_padawan · July 24, 2020, 7:21am

Does this really work, i.e. the GPU memory is cleared in nvidia-smi? I tried this, but my GPU memory usage did not go down.

philchu · July 24, 2020, 11:28am

On this recurring question of reclaiming CUDA GPU memory… I’ll give it a shot here.

Short Answer: The PyTorch team chooses not to provide an API to reset the the cuda context. Yes, it has been asked many times, and still is rejected: [feature request] a function to empty all the cuda buffer · Issue #20837 · pytorch/pytorch · GitHub

Their rationale is understandable:

“If you want to swing the big axe, pull the rug from under all tensors say in a live notebook session, be my guest do it at your own risk, you’ve been warned. And sorry, not from the PyTorch api.”. They have other technical reasons for not supporting this as well, such as complication to multiprocessing features. – don’t shoot the messenger here, if you don’t like this reality.

So “What is to be done?”

Slightly Long Answer

The gentle way when using PyTorch and/or `fastai/fastai2` works to certain extent:

del learn
gc.collect()
torch.cuda.empty_cache()

But if the application has live objects (DataLoader/DataBlock/DataSet in fastai, for instance) in the notebook that hold on the some Torch tensors, sorry, at least the 600MB Torch CUDA context plus those tensors will not be freed.

The big axe — install other CUDA driver that does have this big reset switch: `Numba`, `PyCUDA` etc.

The following will work, but will throw the entire loaded torch library + PyTorch tensors + fastai DataLoader etc in chaos, accessing them will cause undefined behavior/error:

import numba.cuda as cu
dev = cu.select_device(0) # or which ever GPU ID
dev.reset()

then the whole “context” of the calling process will vanish from nvidia-smi output. Now you’re faced with a broken loaded torch module, who doesn’t know its context was destroyed by someone else (numba).

The 3rd way, restart the notebook kernel, and reload `torch`/`fastai2` etc…

moon · September 21, 2020, 4:18pm

Inserting gc.collect() is a suggestion on pytorch issue 16417 (linked), but if someone has a minute please provide advice on precisely how I can apply this fix to the notebook camvid.ipynb from github fastai / fastai / nbs / examples ?

I have already tried to reduce batch size and tried to apply a silly low-resoution transform to see if I can get it to run (on line below the size is ridiculously small instead of the normal (360,480)
batch_tfms=[*aug_transforms(size=(36,48)), Normalize.from_stats(*imagenet_stats)])
)

I’m trying to allocate “20MB” and having CUDA out of memory error when I should have at least 2GB free on a 6GB VRAM GPU. Similar to the error on the pytorch issue.

the out of memory error occurs on the line
learn.lr_find()

(note: I am able to run the much smaller CAMVID_TINY example from https://github.com/fastai/fastbook/blob/master/01_intro.ipynb with no issue, just the camvid example notebook at the top with the larger dataset is throwing the error)

How to reclaim GPU memory in v2?

On this recurring question of reclaiming CUDA GPU memory… I’ll give it a shot here.

The gentle way when using PyTorch and/or fastai/fastai2 works to certain extent:

The big axe — install other CUDA driver that does have this big reset switch: Numba, PyCUDA etc.

The 3rd way, restart the notebook kernel, and reload torch/fastai2 etc…

The gentle way when using PyTorch and/or `fastai/fastai2` works to certain extent:

The big axe — install other CUDA driver that does have this big reset switch: `Numba`, `PyCUDA` etc.

The 3rd way, restart the notebook kernel, and reload `torch`/`fastai2` etc…