The GPU memory jumped from 350MB to 700MB, going on with the tutorial and executing more blocks of code which had a training operation in them caused the memory consumption to go larger reaching the maximum of 2GB after which I got a run time error indicating that there isn’t enough memory.
I know for this particular case, this can be avoided by skipping the previous blocks of code which had a training operation in them and just executing the one where I ran out of memory, but how else could this be solved? I tried executing del learn but that doesn’t seem to free any memory.
For me learn.destroy() worked.
Also, i had the CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 11.00 GiB total capacity; 8.63 GiB already allocated; 14.32 MiB free; 97.56 MiB cached) issue. Fixed it to work with Jeremy’s bs (lesson3-camvid/2019) by adding .to_fp16() on the learner. Most probably fragmentation related…
hi Octav, could you please show a code line to which you added .to_fp16()?
Also, please explain when would you use learn.destroy() - after saving the model?
Thanks!
Regarding the .destroy action, I did not save the model since I was not going to use it (given that it did not fit on my GPU’s memory). I have used learn.destroy() and then checked gpu_mem_get_free_no_cache(). Nowadays i am just restarting the whole kernel instead…
Thanks! How do you then deal with the tensor/input mismatch error “RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same”