NB_10_nlp notebook may have a memory leak

NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1 

 0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 23%   25C    P8     8W / 250W |  10559MiB / 11178MiB |      0%      Default 

  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     18141      C   /home/dl/anaconda3/envs/fastai2/bin/python 10547MiB |

After closing NB 10 which cleared memory of GPU
Ran both NB 11 and NB 12

0 GeForce GTX 108… Off | 00000000:02:00.0 Off | N/A |
| 23% 21C P8 8W / 250W | 501MiB / 11178MiB |

In the NB 10 after modifying SentencePieceTrainer.Train input first line by removing the {q}'s
We get a

RuntimeError: CUDA out of memory. Tried to allocate 2.29 GiB (GPU 0; 10.92 GiB total capacity; 7.50 GiB already allocated; 619.44 MiB free; 9.79 GiB reserved in total by PyTorch)

This occurs at

Fine tuning the language model

learn = language_model_learner(
dls_lm, AWD_LSTM, drop_mult=0.3,
metrics=[accuracy, Perplexity()]).to_fp16()

learn.fit_one_cycle(1, 2e-2)

epoch train_loss valid_loss accuracy perplexity time
0 20413316.000000 00:01


RuntimeError Traceback (most recent call last)
in
----> 1 learn.fit_one_cycle(1, 2e-2)

~/fastai-2020/fastai2/fastai2/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
110 scheds = {‘lr’: combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
111 ‘mom’: combined_cos(pct_start, *(self.moms if moms is None else moms))}
–> 112 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
113
114 # Cell

~/fastai-2020/fastai2/fastai2/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
188 try:
189 self.epoch=epoch; self(‘begin_epoch’)
–> 190 self._do_epoch_train()
191 self._do_epoch_validate()
192 except CancelEpochException: self(‘after_cancel_epoch’)

~/fastai-2020/fastai2/fastai2/learner.py in _do_epoch_train(self)
161 try:
162 self.dl = self.dls.train; self(‘begin_train’)
–> 163 self.all_batches()
164 except CancelTrainException: self(‘after_cancel_train’)
165 finally: self(‘after_train’)

~/fastai-2020/fastai2/fastai2/learner.py in all_batches(self)
139 def all_batches(self):
140 self.n_iter = len(self.dl)
–> 141 for o in enumerate(self.dl): self.one_batch(*o)
142
143 def one_batch(self, i, b):

~/fastai-2020/fastai2/fastai2/learner.py in one_batch(self, i, b)
149 self.loss = self.loss_func(self.pred, *self.yb); self(‘after_loss’)
150 if not self.training: return
–> 151 self.loss.backward(); self(‘after_backward’)
152 self.opt.step(); self(‘after_step’)
153 self.opt.zero_grad()

~/anaconda3/envs/fastai2/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
193 products. Defaults to False.
194 “”"
–> 195 torch.autograd.backward(self, gradient, retain_graph, create_graph)
196
197 def register_hook(self, hook):

~/anaconda3/envs/fastai2/lib/python3.7/site-packages/torch/autograd/init.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
97 Variable._execution_engine.run_backward(
98 tensors, grad_tensors, retain_graph, create_graph,
—> 99 allow_unreachable=True) # allow_unreachable flag
100
101

RuntimeError: CUDA out of memory. Tried to allocate 2.29 GiB (GPU 0; 10.92 GiB total capacity; 7.50 GiB already allocated; 619.44 MiB free; 9.79 GiB reserved in total by PyTorch)

Suspicion is sentencepiece?

This post should be ignored. The memory leak

was due to something in and around the to_fp16() and perhaps other issues being worked in the background as the fastai2 team work to complete the release.

Also I should have read more closely the fastbook chapter.

I was able to move forward by removing that part of the call as it is intended not for GPU use.

As this is a post that may confuse people it should be deleted but I don’t have the permissions.

I have an issue now where my id:0 GPU is not being utilised. I have 2 GPUs id:1 is used for the screen but has GPU capabilities and 2GB memory .
GPU id:0 has in the region 11GB but now not being utilised since I removed the .to_fp16()

I change my environment most daily and git pull fastcore, fastai2, and course-v4 and pip uninstall and install the libraries using pip install -e ".[dev]"

As can be seen it was utilised before in the print out but although the notebook works it uses cpu only

my pytorch is

version 1.4.0 py3.7_cuda10.1.243_cudnn7.6.3_0

Here is my solution and many thanks to the original contributor to this post.

Post Regarding Local Environment Installs with GPUs

How did you solve it? code-wise… I’ve just run into this problem and i’m running my jupyter notebook on paperspace gradient. So I know it’s not a local gpu issue. I’ve read multiple post on how to fix this issue and have not been able to move forward, nothing works.

Any help would be appreciated.

For anyone who runs into this problem. The way I solved it was to reduce the bs (batch size) by half in dls_lm, it worked without issue after that. The accuracy only dropped by a point, not the best outcome, but it allowed me to continue without getting a runtime error.

Update: I’m not sure if this is related to a smaller batch size or if it’s to do with using paperspace. But later on in the chapter – Fine-Tuning the Classifier – when I load ‘finetune’ and try to fine tune it via the:

learn.fit_one_cycle(1, 2e-2)

I get the error:

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastprogress/fastprogress.py:74: UserWarning: Your generator is empty.
  warn("Your generator is empty.")

I’ve tried everything to solve both issues, googled, poked around the code. No luck. So yeah, If I make my initial batch size too big and try redo the 10 epoch training, then I run into the “CUDA out of memory” problem. But if I half the batch size, then later on, when I’m fine tuning the classifier, I run into the “Your generator is empty”… or maybe it’s just another problem altogether.

Anyone got ideas?