Discovered some promising info regarding GPU memory leaks following a “CUDA out of memory error.” Still working through it myself, but that could take some time and I didn’t want to lose the links:
[EDIT] These didn’t work out as hoped.
-
Evaluating “1/0” to force a new exception to release resources held by the previous frame, did not work.
-
Doing
os.environ['FASTAI_TB_CLEAR_FRAMES']="1"
at the top of the notebook, didn’t work. -
The “Custom Solutions” using
@gpu_mem_restore
andwith gpu_mem_restore_ctx():
didn’t work
In all cases, the behaviour is unchanged, and remains as follows…
train('convnext_large_in22k', 224, epochs=1, accum=1, finetune=False)
CUDA Out Of Memory Error
report_gpu()
Before GC: GPU:0
process 32095 uses 16263.000 MB GPU memory
Post GC: GPU:0
process 32095 uses 4141.000 MB GPU memory