It seems lr.find would have to do a full back & forward pass to find the gradients and test the effect of different learning rates on the loss?
I understand its usefulness and how to use its results, but I don’t get what it’s doing that’s different to doing single epoch passes in a loop while incrementing lr to find the optimal? It seems like a chicken and egg situation?
What is important to notice are the save and load lines.
When you run lr_find, the stopping condition is self.stop_dv and (math.isnan(loss) or loss>self.best*4), i.e. when your loss is NaN or 4x bigger than your best loss value. So basically, you check for the moment when your learning rate is too high and your network weights start diverging.
The problem is that, when it happens, it is almost impossible for your network to repair the damage done to the weights and to come back to a converging behavior, reason why we have the save/load in the lr_finder.
So you certainly do not want to have a learning rate test while you are training and this is the reason both need to be separate.
Oh yeah, I probably run an older version. Then go to the LRFinder callback, you will see the save at the on_train_begin and the load at on_train_end so the idea is still the same.