Am I reading this lr_find() plot correctly?

I’m running the learning rate finder on a kaggle dataset and (structured, regression) and I’m getting a plot as shown in the screenshot below.

If I zoom into the plot, it seems that somewhere around 2*1e-5 is the best learning rate, but I never really see the point where loss really starts to become unstable – it just kind of remains the same as the lr increases…

Am I reading this correctly? Should I be doing something differently (increase/decrease batch size?)?

Thanks!

Check out the paper!
https://arxiv.org/abs/1803.09820v2
Try lr_find(linear = True) it will give you more information.

1 Like

Reading the papers is always the best thing, but a bit of elaboration wouldn’t go amiss in this case.

2 Likes

Thanks for the tip!

Unfortunately, adding the linear = True doesn’t help that much – if I used the learning rate suggested then it looks like I get stuck in a local minimum:


I played around with batch size but that didn’t seem to have too much of an effect (in regards to seeing what I would expect after watching the lectures).

Increasing the complexity of the model (adding an additional layer with additional activations, from [1000, 500] to [2000, 1000, 500]), but it seems the schedule plot remains the same:

I read through the paper but I still feel like I’m missing the intuition to explain why I never really lose (gain) too much loss at the higher lr.

1 Like

I had some extra time so I decided to read through the paper again extra closely – do you think my situation qualifies for the 1cycle method described in the paper?

(I’m going to test it out anyway, just wondering if I’m understanding when to use it correctly)

try greater values for end_lr

2 Likes

The local minimum problem can be solved by the 1cycle use_clr method or cosine annealing cycle_mult

2 Likes