NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 0 GeForce GTX 108... Off | 00000000:02:00.0 Off | N/A | | 23% 25C P8 8W / 250W | 10559MiB / 11178MiB | 0% Default GPU PID Type Process name Usage | |=============================================================================| | 0 18141 C /home/dl/anaconda3/envs/fastai2/bin/python 10547MiB |
After closing NB 10 which cleared memory of GPU
Ran both NB 11 and NB 12
0 GeForce GTX 108… Off | 00000000:02:00.0 Off | N/A |
| 23% 21C P8 8W / 250W | 501MiB / 11178MiB |
In the NB 10 after modifying SentencePieceTrainer.Train input first line by removing the {q}
's
We get a
RuntimeError: CUDA out of memory. Tried to allocate 2.29 GiB (GPU 0; 10.92 GiB total capacity; 7.50 GiB already allocated; 619.44 MiB free; 9.79 GiB reserved in total by PyTorch)
This occurs at
Fine tuning the language model
learn = language_model_learner(
dls_lm, AWD_LSTM, drop_mult=0.3,
metrics=[accuracy, Perplexity()]).to_fp16()
learn.fit_one_cycle(1, 2e-2)
epoch train_loss valid_loss accuracy perplexity time
0 20413316.000000 00:01
RuntimeError Traceback (most recent call last)
in
----> 1 learn.fit_one_cycle(1, 2e-2)~/fastai-2020/fastai2/fastai2/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
110 scheds = {‘lr’: combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
111 ‘mom’: combined_cos(pct_start, *(self.moms if moms is None else moms))}
→ 112 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
113
114 # Cell~/fastai-2020/fastai2/fastai2/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
188 try:
189 self.epoch=epoch; self(‘begin_epoch’)
→ 190 self._do_epoch_train()
191 self._do_epoch_validate()
192 except CancelEpochException: self(‘after_cancel_epoch’)~/fastai-2020/fastai2/fastai2/learner.py in _do_epoch_train(self)
161 try:
162 self.dl = self.dls.train; self(‘begin_train’)
→ 163 self.all_batches()
164 except CancelTrainException: self(‘after_cancel_train’)
165 finally: self(‘after_train’)~/fastai-2020/fastai2/fastai2/learner.py in all_batches(self)
139 def all_batches(self):
140 self.n_iter = len(self.dl)
→ 141 for o in enumerate(self.dl): self.one_batch(*o)
142
143 def one_batch(self, i, b):~/fastai-2020/fastai2/fastai2/learner.py in one_batch(self, i, b)
149 self.loss = self.loss_func(self.pred, *self.yb); self(‘after_loss’)
150 if not self.training: return
→ 151 self.loss.backward(); self(‘after_backward’)
152 self.opt.step(); self(‘after_step’)
153 self.opt.zero_grad()~/anaconda3/envs/fastai2/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
193 products. Defaults toFalse
.
194 “”"
→ 195 torch.autograd.backward(self, gradient, retain_graph, create_graph)
196
197 def register_hook(self, hook):~/anaconda3/envs/fastai2/lib/python3.7/site-packages/torch/autograd/init.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
97 Variable._execution_engine.run_backward(
98 tensors, grad_tensors, retain_graph, create_graph,
—> 99 allow_unreachable=True) # allow_unreachable flag
100
101RuntimeError: CUDA out of memory. Tried to allocate 2.29 GiB (GPU 0; 10.92 GiB total capacity; 7.50 GiB already allocated; 619.44 MiB free; 9.79 GiB reserved in total by PyTorch)
Suspicion is sentencepiece?