Speed up batch size discovery hundreds of times (to fit into your GPU RAM)

stas · February 21, 2019, 8:12pm

OK, you know how time consuming the bs discovery is, you got to run lr_find/fit and then if bs is too big do it again, then the next stage, and repeat it all together again and again, then the next stage, and meanwhile waiting, and waiting and waiting…

Then recently I discovered that all the required GPU memory is setup during the first batch of the first epoch see tutorial. So I thought why do I waste hours waiting for things to finish. I also find the waiting makes me really tired physically. Surely, I can truncate my dataset while tuning things up, but even that is not very efficient if say the batch size progressively gets smaller and image size gets larger and it’s all error-prone and you have to manually change things and to remember to put them back to normal… too much hassle.

So Sylvain and I came up with a quickie method, all you do is add the following code somewhere in your notebook before your first learn is created:

# this is for bs tune up - it will run through fit calls really fast, doing the minimum required to setup gpu mem requirements
from fastai.callback import Callback

class StopAfterNBatches(Callback):
    "Stop training after n batches of the first epoch."
    def __init__(self, n_batches:int=2):
        self.stop,self.n_batches = False,n_batches-1 # iteration starts from 0

    def on_batch_end(self, iteration, **kwargs):
        if iteration == self.n_batches: return {'stop_epoch': True, 'stop_training': True}

This version works in v1.0.47 and later. It’s part of the library now.

https://docs.fast.ai/callbacks.misc.html#StopAfterNBatches

So load it with:

from fastai.callbacks.misc import *

And then to use it with your learn object:

from fastai.callbacks.misc import *
learn.callbacks.append(StopAfterNBatches(n_batches=2))

or even better you can set it globally for all learners in that notebook, without needing to add/remove any code:

defaults.extra_callbacks = [StopAfterNBatches(n_batches=2)]

And here is how to make it easy to turn it on and off globally in your notebook:

# True turns the speedup on, False return to normal behavior
tune = True
if tune:
    defaults.extra_callbacks = [StopAfterNBatches(n_batches=2))]
else:
    defaults.extra_callbacks = None

This callback, once set, will impact all the subsequent lr_find and fit* calls. They will all finish within seconds and fail as normal if there isn’t enough RAM, which is when you reduce your bs and other params.

Remember this is for quick batch size (and other param) tuning, ignore the fact that the training data is completely premature and you will have all progress bars red as they all get interrupted.

When done tuning, flip the tune var to False and everything is back to normal. If later you need to re-tune since say you made major changes, flip tune on again, run the nb, adjust params, and turn it off again.

I’ve only tried it in a few vision notebooks and the memory consumption is almost identical as the full normal run.

So please give it a test run, I haven’t tried it with any text training for example.

Some learners will certainly have minor issues. For example, unet_learner currently has a pretty significant memory consumption fluctuation on each run (even with fixed seed), so beware of that - unet learner is not stable memory-wise at the moment.

Feedback and improvement suggestions are welcome.

If it looks like a solid tech we will integrated it into fastai. The defaults.callbacks is already in, I am referring to adding the special callback so that you don’t have to copy-n-paste it.

And the next stop is batch-size and other hyper-param auto-discoverer, which some of you have already started working on.

If you’d like to discuss the methods of choosing a better bs, please see this thread.

bfarzin · February 21, 2019, 8:27pm

I was just getting back to look at this today! Thanks for the work and progress and let’s see what we can do with this. Very cool.

stas · February 22, 2019, 6:22am

2 posts were merged into an existing topic: The “BS<=32” paper

stas · February 22, 2019, 3:58am

Let’s please kindly stay focused - this thread is not about how to choose or what the best choice of bs is. It’s about how to blazingly fast validate that your chosen bs is fitting into your available GPU RAM, once you either know what you want or you have a situation like fastai lessons that were written for a larger capacity card and yours is smaller and you don’t want to waste time waiting for each stage to finish just to discover that your params need to be different. And it’s not just bs, sometimes you have to change your image size and other params… And we are talking only about fitting into the available GPU RAM.

In particular I’m looking for feedback after you tried this method in your specific domain so that we can decide whether we want to include this as a feature in the fastai library. i.e. yeah, a time saver, or nay, doesn’t quite help. and if the latter please share the details so that perhaps we can improve it to be a ‘yeah’.

Thank you.

crayoneater · February 22, 2019, 6:07am

Thanks Stas. For those who want to continue the batch size discussion, I’ve added a thread here: The "BS<=32" paper

stas · February 22, 2019, 6:23am

Thank you, @crayoneater, for doing that. That’s an important discussion too. I moved the related posts into that thread and linked to the thread your created from the first post here.

stas · March 4, 2019, 12:39am

So far I found this callback to be extremely practical. Any suggestions for a good name before we add it to the fastai library?

sgugger · March 4, 2019, 3:08pm

Mine obviously

stas · March 4, 2019, 4:44pm

Yours:

doesn’t imply any shortening of the epochs
N in constructs like FooNBar usually stands for “Foo And Bar”, as in RockNRock

So I’m thinking perhaps it should mention what it does, rather than how it does it.

Maybe something like QuickRAMFit, FitQuick, SpeedyFit, MinFit?

and please feel free to make your own suggestions. Thank you.

amqdn · March 4, 2019, 7:07pm

Coming from someone who hasn’t yet contributed, my vote is for QuickRAMFit.

amqdn · March 4, 2019, 7:13pm

Or how about just RAMFit?

bfarzin · March 4, 2019, 8:05pm

I think this is going to apply to more than just RAM fitting in the future. StopFitNumBatches, StopTrainNumBatches or similar is where my head is at regarding the functionality & naming. Or if you will tolerate a longer name StopFitAfterNumBatches (but that feels too long to me…)

stas · March 4, 2019, 9:30pm

Good thinking, @bfarzin. Thank you, yes, potentially it’s not just for RAM.

Though I don’t like the batches in the name, since it’s not just about batches, it’s about epochs too. You can see it from the code, so it’s misleading.

That’s why I thought a more generic QuickFit or something similar would be good enough and will allow potential new features added to it in the future.

Here are some potential terms that can be borrowed from the domain of design:

mockup (but this one is usually associated with testing)
prototype
wireframes

DrHB · March 5, 2019, 12:40am

As a comic fan… I would suggest FlashFit or SpeedyFit =)

prosti · March 5, 2019, 3:11pm

How about GPUCheckBs?

stas · March 5, 2019, 10:39pm

OK, it looks like the most clear name that indicates what the callback does is StopAfterNBatches without losing in generality of its possible applications, suggested by Jeremy. I will update the post once it’s under git.

Thank you very much everybody for your suggestions!

stas · March 6, 2019, 10:34pm

OK, it’s in the library as of 1.0.47,
it’s tested https://github.com/fastai/fastai/blob/master/tests/test_callbacks_misc.py
and documented https://docs.fast.ai/callbacks.misc.html#StopAfterNBatches
first post updated to reflect the changes.
Enjoy.