How to offload/prefetch feature maps in PyTorch?

I have a 4GB Quadro M1200 GPU. I have been wondering for a long time now, on how to train large neural nets on my GPU. One idea is to create small batches and uses 3 batches as a single batch.

I came around this How to Train a Very Large and Deep Model on One GPU?. A short summary of the post. GPU stores the feature maps during the forward prop in the GPU memory itself and this occupies like 50-70% GPU memory. And the solution was to move the feature maps during forward prop to CPU memory and during the back prop they would be moved again to GPU memory.

I was wondering if there was some way to implement this in PyTorch or I have to start working on this project from scratch then.

You can ignore the fact that the latter layers would need their feature maps quicker than the starting layers. Implementation is the focus for this post, and we can get into optimizations after that.

This post is also useful and has some code samples in Pytorch

For a single GPU machine they discussed two methods, one for batch size and second checkpoint. I will look into the checkpoint part and see what can I do.

Maybe you can use kaggle kernels or google colab instead, both are free and have fastai v1 ready to use with 11GB GPU :slight_smile:

Although google colab is free but there is a 15GB data limit, so you cannot work with large datasets. And as far as kaggle kernels are concerned, they are a good option as we can also upload our dataset but they are still slow.