Questions about the arguments to lr_find

wgpubs · May 10, 2018, 11:18pm

def lr_find(self, start_lr=1e-5, end_lr=10, wds=None, linear=False, **kwargs)

In reference to lr_find from source, I’m looking for some intuition on the following:

What kind of behavior (e.g., blank plots, etc…) would prompt us to modify start_lr and/or end_lr?
If we are changing start_lr and/or end_lr, what guidance is there in assigning them something appropriate?
What does linear do?
When would we want to set linear=True?
When would setting wds to something be valuable or necessary to getting a good result?

If there are notebooks and/or wiki entries that discuss the above please feel free to just point me in that direction. I’ve been looking through my notes and haven’t really found any satisfying answers to these questions.

jeremy · May 11, 2018, 12:03am

Move them closer together for finer learning rate finding (important for linear). Further away if you want to try more extreme LRs
Look at past papers, or just experiment
Adds a fixed LR per batch, rather than multiplying by a fixed ratio (try plot_lr to see what I mean)
To find the exact point where the loss gets worse, if you’re trying to really optimize your LR
Probably always a good idea to set weight decay to whatever you’ll fit with. It’ll impact the LR finder curve

wgpubs · May 11, 2018, 2:55am

Any specific papers in particular?

And thanks!

jeremy · May 11, 2018, 2:15pm

Any past papers that have looked at datasets and/or architectures similar to what you’re looking at. They’ll let you know what LR they used. Although it’s not at all common that you need to go outside the default LRs in the finder.