I am able to understand the independent topics of the course but at times am not able to build continuity as to what follows next in my network. I have built a CNN model for image classification and am using fit_one_cycle and discriminative learning rate.
The fit_one_cycle uses the superconvergence method by Leslie Smith which is a follow-up on the cyclic learning rate concept. We input the max LR bound in the fit_one_cycle and it returns max accuracy in fewer iterations.
When I use the discriminative learning rate, I choose different learning rates of different parts of my network(3 in my case) and input them in the fit_one_cycle.
I hope this means that every part of the network independently trains using the concept of superconvergence by using its respective max LR that it has been provided.
Also, I believe that the LR finder is used to find the max bound to be given as an input for fit_one_cycle. And then fit_one_cycle automatically increases and decreases the LR within the max bound according to the norms by Leslie Smith, which all happens behind the scenes!
Please correct me wherever I went wrong and add if I missed something crucial to understanding.