Today is the third day that I try to follow the notebook step by step and I always get out of memory.
I have the latest code and I’m using a 1080 ti NVIDIA Card.
I have tried changing
bptt=30
bs=10
Is there any other variable that I need to modify?
Do you have only one model? In the notebook there is learner vs. learn and my impression was that something stays around there. If you are on PyTorch 0.4+ you might also see about using with torch.no_grad(): for evaluation.
In recent PyTorch you can check with torch.cuda.memory_allocated() how much of your GPU memory is used (and not just cached). This should ideally be the memory for the weights. It’ll go up during the forward passes during training, but should be lower again after backward.
I hope this is useful for you, I never dug all that deep into it but just reloaded the kernel and loaded the model I had up to then and continued.
I changed those learn to learner and the issue persist.
At this point what I’m trying to do is to run the whole notebook in full to give me an idea what I need to do after.
Any suggestions?
I’m running NVidia-smi command to find out the status of the GPU, temp and fan speed.
As a last attempt, you could try and install 0.3 version of pytorch. That is the version the course was built on.
It’s been a while since I ran it as well but afair it ran on Colab as well so you could try that.