I suspect I’m not genuinely grasping how the LR is supposed to be used.
If I’m not manking mistakes, even with an unfrozen model, the plot we get from it shows LR vs. Loss graph just about the final block added by fastai to the imagenet model.
If so, how the LR finder can be helpful as one gets to set the LRs for all the rest of layer groups?
I would say to test it out yourself. Would be a good learning experience to freeze all layers but the one you want to use “lr_finder” on and then do it yourself. Would require a bit of pytorch-y work, and a bit of rewriting the library code, but could give you a good idea of how the code works.
All in all though, the main thing we want is the highest learning rate without the model diverging, which is going to be determined by looking at the latter layers, since they are less related to our specific problem(how will identifying cat faces help me recognize numbers?). lr_finder gives us a “upper bound” on the learning rate for the final layers. We know the first layers will need a lower learning rate, but still want to allow it to be fairly high.
I think you have identified a piece of machine learning that we still do “by feel,” but the results have been fairly good up so far.