How can you train your model on large batches when your GPU can’t hold more than a few samples?

Hi there all,

I am trying to train a model, but I can only feed 8 images at a time bz = 8, with images of 256x256, I am getting really bad results, I know we need bigger bz in order to get gradients to propagate and get better results.

Is there a way to let your GPUs use less memory in order to increase the Batch Size?


Reduce your image size. Not sure if any other way.

You can use GradientAccumulation callback to make a bunch of smaller batches act like bigger ones. IE you set your batch size as 1 or 2 but you use the callback to make it update after say 20

1 Like

@FraPochetti the problem with this is that because the gradients dont get to propagate, you end and with long train time to get any decent results or the model never generalizes.

Beside is obvious you can reduce the batch size…


@muellerzr thanks for the post do you have sample code to do this?


For fastai2, the docs are here:

For fastaiv1 see this discussion: Accumulating Gradients

Though I highly recommend v2 :wink:


@muellerzr thanks a lot!! let me investigate

Yep, makes sense.
But why can’t you reduce the image size (allowing to increase batch size) to address this issue?

@FraPochetti, because you need High Res images to have better classification on medical diagnostic.
In most cases (not all, for example in GANs) using bigger batches is better. But we usually have a limitation on our GPU. In this competition we have big images 1400 x 2100 and if we want to use the original size, then even P100 allows only small batches (~2 images per batch).


Right, thanks for clarifying

How about breaking up the large images into smaller sizes and still retain resolution. The new smaller size would be dependant on what you are trying to classify but it may be a possible solution.

@amritv thanks for the suggestion, I have done that, the images are between 25 and 60 megs each, tiff files and I have break then down into 32 parches.

But if the parch is 512x 512, I can only use batch size of 8 and 8 parches, using parches of 128x128 I can do 32 parches with batch size of 16. this give me 0.83 - 0.84

but I need better resolution in order to get better generalization.

1 Like

found a solution.

Thanks to lafoss

Its simple don’t start training for entire dataset, instead use Databunch.

@vishwa thanks for the post but can you elaborate a little bit more.