Why is running lr_find not the first thing we do?

In lessons 1&2, lr_find is used after fit_one_cycle.
Is there a specific reason for this, or is this just for convenience’s sake?
It would seem logical that the very first thing to do is to run lr_find.

I believe the reason is that you have just applied a new classifier head, with weights for it that are random or zeroed.
Thus, you would have to let the classifier head have a crack at the data for one epoch or so to at least let it start to form some semblance of order for the weights…then you would run the lr_find as it would have some reasonable working weights, and could begin to estimate what learning rate would be optimal to proceed.
If you ran lr_find as the first thing, you would be asking for the learning rate of completely random weights, which would not really be useful.

3 Likes

Makes sense, thanks!