I’m trying to understand how to choose max_lr. I understand we can use slice to vary the rate between layers. But how do we choose the values?
Let’s take an example.
With a learning rate vs loss curve like the above, would the max_lr need to be to the right or the left of the minimum? Is it the max, in the sense of being the largest value? Or the max, in the sense of being the smallest value we should try as we decay the rate lower.
I’d assume the rate for the above should be about 8e^3, just to the right of the minimum. Is this correct?