Hi @florianl. You’re right about your affirmation. Then, I tested in this notebook (nbviewer version) how to create parameters groups for a model more complicated like resnet18.
As well, I tested all the methods to pass different Learning Rates (one by parameters group).
In fact, they are 4 possibilities, not 3.
For example with 3 parameters groups, you can do the following (for a Learner
unfrozen and the use of learn.fit_one_cycle()
):
- if lr_max = 1e-3 → [0.001,0.001,0.001]
- if lr_max = slice(1e-3) → [0.0001,0.0001,0.001]
- if lr_max = slice(1e-5,1e-3) → array([1.e-05, 1.e-04, 1.e-03]) #LRs evenly geometrically spaced
- if lr_max = [1e-5, 1e-4, 1e-3] → array([1.e-05, 1.e-04, 1.e-03]) #LRs evenly linearly spaced or not
Explications
- All parameters groups will use the same Learning Rate (and the same Optimizer method like Adam + 1cycle policy for all).
- The last layer group’s Learning Rate (max) value is setup to lr, and all previous parameters groups ones to lr/10.
- Train the very first layers at a Learning Rate of 1e-5, the very last at 1e-3, and the Learning Rates of other parameters groups are evenly geometrically spaced between theses two values.
- Train the very first layers at a Learning Rate of 1e-5, the very last at 1e-3, and the Learning Rates of other parameters groups are evenly linearly spaced between theses two values or you can pass as a list any Learning Rate values.
WARNING
Points 3 and 4 are not equivalent for a number of parameters groups greater than 3!!!
- point 3: Learning Rates are calculated geometrically.
- point 4: you can pass an array with the Learning Rate values you want.
List of Learning Rates: last values of cosine annealing for the 1cycle policy
In order to get the list of Learning Rates by parameters group that was passed to the Learner
by the Optimizer
during the training, you can display the hyperparameters by using learn.opt.hypers as following:
for i,h in enumerate(learn.opt.hypers):
print(i,h)