How can you train your model on large batches when your GPU can’t hold more than a few samples?

orangelmx · May 10, 2020, 2:13pm

Hi there all,

I am trying to train a model, but I can only feed 8 images at a time bz = 8, with images of 256x256, I am getting really bad results, I know we need bigger bz in order to get gradients to propagate and get better results.

Is there a way to let your GPUs use less memory in order to increase the Batch Size?

Thanks!

FraPochetti · May 10, 2020, 4:06pm

Reduce your image size. Not sure if any other way.

muellerzr · May 10, 2020, 4:07pm

You can use GradientAccumulation callback to make a bunch of smaller batches act like bigger ones. IE you set your batch size as 1 or 2 but you use the callback to make it update after say 20

orangelmx · May 10, 2020, 4:15pm

@FraPochetti the problem with this is that because the gradients dont get to propagate, you end and with long train time to get any decent results or the model never generalizes.

Beside is obvious you can reduce the batch size…

Thanks.

orangelmx · May 10, 2020, 4:16pm

@muellerzr thanks for the post do you have sample code to do this?

Thanks.

muellerzr · May 10, 2020, 4:21pm

For fastai2, the docs are here:

https://dev.fast.ai/callback.training#GradientAccumulation

For fastaiv1 see this discussion: Accumulating Gradients

Though I highly recommend v2

orangelmx · May 10, 2020, 4:31pm

@muellerzr thanks a lot!! let me investigate

FraPochetti · May 10, 2020, 6:44pm

Yep, makes sense.
But why can’t you reduce the image size (allowing to increase batch size) to address this issue?

orangelmx · May 10, 2020, 8:04pm

@FraPochetti, because you need High Res images to have better classification on medical diagnostic.
and
In most cases (not all, for example in GANs) using bigger batches is better. But we usually have a limitation on our GPU. In this competition we have big images 1400 x 2100 and if we want to use the original size, then even P100 allows only small batches (~2 images per batch).

FraPochetti · May 10, 2020, 8:56pm

Right, thanks for clarifying

amritv · May 10, 2020, 10:57pm

How about breaking up the large images into smaller sizes and still retain resolution. The new smaller size would be dependant on what you are trying to classify but it may be a possible solution.

orangelmx · May 11, 2020, 12:12am

@amritv thanks for the suggestion, I have done that, the images are between 25 and 60 megs each, tiff files and I have break then down into 32 parches.

But if the parch is 512x 512, I can only use batch size of 8 and 8 parches, using parches of 128x128 I can do 32 parches with batch size of 16. this give me 0.83 - 0.84

but I need better resolution in order to get better generalization.

orangelmx · May 11, 2020, 6:57pm

found a solution.

Thanks to lafoss

vishwa · May 12, 2020, 8:15am

Its simple don’t start training for entire dataset, instead use Databunch.

orangelmx · May 12, 2020, 1:50pm

@vishwa thanks for the post but can you elaborate a little bit more.