Learning rate

(Prabin Nepal) #1

Can anyone explain about finding the learning rate from the graph with its intuition?


(Abhinav Verma) #2

Do you mean about the LR finder. If yes then LR finder is basically a way of running all the mini batches through one small training loop of 1 or slighlty more than 1 epoch where by every mini batch the LR is increased from the lower bound to the upper bound. The main purpose for doing that is to find an optimum learning rate that can be used to help the model converge faster. Once the training loops is completed or the loss doesn’t change for more than 4 mini batches then training stops. All the losses are plotted alongside the LR. Usually Jeremy’s logic for choosing the LR is based on the that around when the loss starts increasing from a constant decrease i.e model starts overfitting and divide that LR by 10. Then it passes to fit one cycle where other stuff like cyclical momentum is used. Sylvain Gugger another of fast.ai’s fellows has written a very good blog on this

Hope this helps


(Gabriel Fior) #3

I also went trough the blog post from Sylvain regarding learning rate, as mentioned by @averma.
One of its recommendation is:

[…] Looking at this graph, what is the best learning rate to choose? Not the one corresponding to the minimum.
Why? Well the learning rate that corresponds to the minimum value is already a bit too high, since we are at the edge between improving and getting all over the place. We want to go one order of magnitude before, a value that’s still aggressive (so that we train quickly) but still on the safe side from an explosion.

Meanwhile Jeremy recommends in lesson 3 (DL 1 course 2019) that we pick the value for the learning rate where the gradient is the steepest, which as I understood do not match the suggestion presented in the blog post, which is to find the minimum and get a value 10x smaller.

Can anyone comment on those 2 approaches? I am a little confused regarding which one to pick for the optimal learning rate.