Behavior of lr_find - Shuffling LRs

Hey there,

I’ve read and experimented quite a lot about this lr_find() method, and I’m now trying to write it from scratch in Keras.
There’s one thing that escapes me. We are basically doing this:

  • for each minibatch, use an exponentially increasing LR (starting from a low value to a high value)
  • save the loss for that minibatch, store the tuple (LR, loss)
  • plot all the saved tuples computed by going through all minibatches in one training epoch.

Now, I’ve tried shuffling the training set before calling lr_find, and the results are basically the same, but if I shuffle LRs (instead of increasing it exponentially I’m just using a random one), and the results are a lot worse (almost “random”).

Is the learn.sched.plot() basically just reflecting the loss function’s topology? Does it really convey a strong message about the “right” LR to start with?

I hope my question is clear!

1 Like

Shuffling the learning rates doesn’t make any sense to me. What the paper describes, and fastai implements, is gradually increasing them.

It’s more of a theoretical question than one about the implementation. I understand what fastai implements (and flawlessly), but yesterday it made sense to me that this process should work even with shuffling the learning rate and using a random one for each minibatch. This morning, I’m honestly not so sure anymore about this!