Batch size finder from OpenAI implemented using Fastai

Hello there !

Following Jeremy Howard post on Twitter, I went and tried to implement the paper

It did not take too much time to realize that the first time I read the paper I barely looked at the result, and it was not the easiest paper to understand, as they do not give much details about the math behind, neither do they explain clearly their implementation.

I ran into a problem when trying to implement this : it seems that they assume that we have a multi GPU used to compute the statistics we are looking for.

I have tried three different ways to solve this problem but none seems to really work.

I have written a little Colab Notebook to explain better my work, and if anyone is interested I would be glad to try to solve this problem together :smile:


Actually I found the solution on my own, managed to implement the paper and did a little testing on Rossmann’s stores, and found a 4x speedup using a bs of 512 instead of 64 ! :smiley:

I wrote a Medium article about this to share the results, would be glad to hear it from you :slight_smile:


Well done! I can’t wait to try this out!

Thank you ! It’s a bit tricky to use, as there is a lot of instability of the curve, as there is a lot of approximation. My take on this is to increase beta from 0.99 to 0.999 for example if the curve is too bumpy.

But please keep me updated :smiley:
I think it works rather well on Tabular data at least, but will have to try with Images and NLP but you might run into CUDA Out of memory in any case if you use too big bs ^^


A lot of my work is with tabular so I’ll keep you updated how it pans on my datasets!

This seems great! I just added a link in the unofficial fastai extensions repository :slight_smile:

@DanyWin, could you encapsulate your work in a bs_finder type of callback and put it in a github repository ? That would make it much easier to test / reuse.