Here’s an experiment I ran on colab (not sure if ipyexperiments take into account how colab allocates gpu RAM):
*** Experiment started with the Pytorch backend
Device: ID 0, Tesla K80 (11.2 GB RAM)
*** Current state:
RAM: Used Free Total Util
CPU: 1.7 GB 10.9 GB 12.7 GB 15.76%
GPU: 327.0 MB 10.9 GB 11.2 GB 2.94%
import pretrain_lm
expm = pretrain_lm.LMHyperParams(dataset_path='/content/data/ar28/',
base_lm_path=None, bidir=True,
qrnn=False, tokenizer='v', max_vocab=32000,
emb_sz=400, nh=1150, nl=3, clip=0.20,
bptt=64, lang='ar', name='Arabic')
learn = expm.train_lm(num_epochs=1, bs=64, drop_mult=0.3, lr=5e-3)
[crashes with cuda OOM error, ran successfully with bs = 32]
*** Experiment finished in 00:01:17 (elapsed wallclock time)
*** Local variables:
Deleted: expm, pretrain_lm
*** Experiment memory:
RAM: Consumed Reclaimed
CPU: 1.6 GB 0.0 B ( 0.00%)
GPU: 10.1 GB 1.4 GB ( 14.33%)
*** Current state:
RAM: Used Free Total Util
CPU: 3.3 GB 10.4 GB 12.7 GB 32.02%
GPU: 9.0 GB 2.2 GB 11.2 GB 410.81%
The corpus size is around 28m tokens. Is it feasible that 10 gb were consumed and could not run the cell? Or maybe the experiment is not reading colab’s allocation policies correctly? What’s an approximate gpu memory cost for this process? I think 10gb is too much.
Edit: I ran the same test on Kaggle and here are the results (cuda OOM for same parameters above.
*** Experiment started with the Pytorch backend
Device: ID 0, Tesla K80 (11.2 GB RAM)
*** Current state:
RAM: Used Free Total Util
CPU: 1.8 GB 13.2 GB 15.7 GB 13.30%
GPU: 327.0 MB 10.9 GB 11.2 GB 2.94%
[OOM process]
*** Experiment finished in 00:02:10 (elapsed wallclock time)
*** Local variables:
Deleted: expm, pretrain_lm
*** Experiment memory:
RAM: Consumed Reclaimed
CPU: 3.2 GB 0.0 B ( 0.00%)
GPU: 10.1 GB 1.4 GB ( 14.35%)
*** Current state:
RAM: Used Free Total Util
CPU: 4.9 GB 10.0 GB 15.7 GB 49.07%
GPU: 9.0 GB 2.2 GB 11.2 GB 408.40%