Lesson 6 - Official topic

ilovescience · April 22, 2020, 1:49am

The basic idea is that if it is steepest, that’s the learning rate where it will learn the fastest.

theptrk · April 22, 2020, 1:50am

even with a small learning rate, with enough epochs, shouldn’t it eventually find the minimum?

pinaki · April 22, 2020, 1:50am

in lr_find so for every learning rate we use different mini batches at different step
– right ? So the mini-batch data set is not constant. Wont that affect the loss in addition to the lr ? How do we know lr is actually affecting the loss ?

giacomov · April 22, 2020, 1:50am

Would it make sense to run one learning rate over a bunch of mini-batches (instead of just one) and measure average and variance to have an uncertainty estimate? This would probably reduce noise, as there are easier and harder mini batches I guess

sgugger · April 22, 2020, 1:51am

The implementation of PyTorch lighting is heavily inspired by the one in fastai FYI

Albertotono · April 22, 2020, 1:51am

Is it good to keep the same learning rate for the entire training or should we change it during the training?

harish3110 · April 22, 2020, 1:51am

It will find a minimum. The problem is it could be a local minimum and not the best one.

ilovescience · April 22, 2020, 1:51am

Yep I am aware. I was just pointing out how fastai has set a precedent for other libraries

jwuphysics · April 22, 2020, 1:51am

If the learning rate is too small, then the optimizer will get stuck in poor local minima and won’t be able to get out. There are exponentially increasing bad local minima as networks get more complex.

KevinB · April 22, 2020, 1:51am

#TeamSylvain (min/10)

sgugger · April 22, 2020, 1:52am

You should run it each time you change something major: if you apply more data augmentation, if you change the size of your images, if you unfreeze part of your models (stay tuned for that part).

avatar · April 22, 2020, 1:52am

Should we do lr_find every time we unfreeze a layer?

sgugger · April 22, 2020, 1:53am

It’s not just one, it’s a different one at each batch.

matdmiller · April 22, 2020, 1:54am

With too small of a learning rate there is a good chance that your model might get stuck at a local minimum instead of generalizing well. Also it would be pretty time consuming and computationally wasteful to pick a very small learning rate.

Image From: https://machinelearningmastery.com/why-training-a-neural-network-is-hard/

init_27 · April 22, 2020, 1:54am

Top thing that “Jeremy says to do”: Trying the easiest things first-it applies very well to Kaggle competitions too among all things ML.

Raymond-Wu · April 22, 2020, 1:54am

Also gives you a good baseline to practically compare your model to.

FraPochetti · April 22, 2020, 1:55am

We definitely should change it after training for a couple of epochs, or at least check it has not changed dramatically from our first guess.
I think Jeremy is getting there, e.g. after freezing/unfreezing top layers.

rachel · April 22, 2020, 1:55am

Why would an “ideal” learning rate found with a single mini-batch at the start of training keep being a good learning rate even after several epochs and further loss reductions?

I think Jeremy is going to come back to this (since we haven’t talked about going several epochs yet), although ask again if he doesn’t

giacomov · April 22, 2020, 1:55am

yeah, I meant, within each lr. So:

a) set lr
b) fit next mini batch
c) reset
d) fit another mini batch
e) repeat b-d a certain number of times (10?)
f) measure average and std
g) change lr and restart from a

Just a little bootstrapping for each step so we get an uncertainty estimate for every step, and the plot would look like a shaded region and not a line.

giacomov · April 22, 2020, 1:56am

Maybe I’ll try it out and see what happens, sorry I’m not explaining myself very well