Hello, fast.ai community! This is my first post!
I have been reading about CLR, the One-Cycle policy and the LR Range test (the one implemented by lr_find()
) by @Leslie. I understand what the CLR and what the One-Cycle policy but I struggle to understand the intuition behind the LR range test method (and plots).
My understanding is the following (please let me know if it is wrong and if so, where):
During one or a few epochs we train a given network updating (linearly or exponentially) the learning rate, we calculate the associated loss, we backpropagate the loss, update parameters and keep on reducing the LR until the loss starts diverging abruptly. Additionally, as we go, we take an exponentially weighted average of the loss and perform bias correction by dividing by 1- \beta^t where t is the iteration at which the loss is calculated and beta is the coefficient in the weighted average.
What I struggle to understand is how this is an accurate (and unbiased) estimate of the effect of the learning rate on the entirety of the loss ‘landscape’. For example, given that the first iteration starts somewhere random, that would certainly affect the loss calculated at that iteration (wouldn’t it?). Additionally, what if at some point the optimizer gets stuck in a minimum (by chance/randomness), wouldn’t the loss during those iterations also be biased? And finally, are the results we get from an LR Range test batch-dependent? The only intuition I have is that all these problems are taken care of by the exponentially weighted average but if that is the case I would like a deeper explanation of it. Any help is appreciated.
I have read the explanations by @sgugger in his personal blog, and while they have helped they still do not solve my lack of intuition.