I’ve read and experimented quite a lot about this lr_find() method, and I’m now trying to write it from scratch in Keras.
There’s one thing that escapes me. We are basically doing this:
- for each minibatch, use an exponentially increasing LR (starting from a low value to a high value)
- save the loss for that minibatch, store the tuple (LR, loss)
- plot all the saved tuples computed by going through all minibatches in one training epoch.
Now, I’ve tried shuffling the training set before calling lr_find, and the results are basically the same, but if I shuffle LRs (instead of increasing it exponentially I’m just using a random one), and the results are a lot worse (almost “random”).
Is the learn.sched.plot() basically just reflecting the loss function’s topology? Does it really convey a strong message about the “right” LR to start with?
I hope my question is clear!