This is my hardware:
Paperspace, p40000
RAM: 30 GB
CPUS: 8
HD: 210.5 KB / 100 GB
GPU: 8 GB
… So, everything is fine, ish.
The language model takes about 45 min per cycle, so took a night to train.
The classification part is less fortunate.
I get
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1512387374934/work/torch/lib/THC/generic/THCStorage.cu:58
Every time I train the full model.
When I freeze the last layers, (learn.unfreeze()), the learner does not fit due to running out of memory.
learner does not run out of memory if I freeze some of the layers (i.e., learn.freeze_to(4) trains fine with bs=48. )
I have tried from batch size = 48 to 4, and still running out of memory.
additionally, I got this error as well (bs=4), not sure if it’s relevant.
Exception in thread Thread-4:
Traceback (most recent call last):
File “/home/paperspace/anaconda3/envs/fastai/lib/python3.6/threading.py”, line 916, in _bootstrap_inner
self.run()
File “/home/paperspace/anaconda3/envs/fastai/lib/python3.6/site-packages/tqdm/_tqdm.py”, line 144, in run
for instance in self.tqdm_cls._instances:
File “/home/paperspace/anaconda3/envs/fastai/lib/python3.6/_weakrefset.py”, line 60, in iter
for itemref in self.data:
RuntimeError: Set changed size during iteration
Anybody else having same/similar trouble?
What are your set ups?