Part 1 (2020) - Weekly Beginner Only Review and Q&A

pierreguillou · May 25, 2020, 2:44pm

Hi @florianl. You’re right about your affirmation. Then, I tested in this notebook (nbviewer version) how to create parameters groups for a model more complicated like resnet18.

As well, I tested all the methods to pass different Learning Rates (one by parameters group).
In fact, they are 4 possibilities, not 3.

For example with 3 parameters groups, you can do the following (for a Learner unfrozen and the use of learn.fit_one_cycle()):

if lr_max = 1e-3 → [0.001,0.001,0.001]
if lr_max = slice(1e-3) → [0.0001,0.0001,0.001]
if lr_max = slice(1e-5,1e-3) → array([1.e-05, 1.e-04, 1.e-03]) #LRs evenly geometrically spaced
if lr_max = [1e-5, 1e-4, 1e-3] → array([1.e-05, 1.e-04, 1.e-03]) #LRs evenly linearly spaced or not

Explications

All parameters groups will use the same Learning Rate (and the same Optimizer method like Adam + 1cycle policy for all).
The last layer group’s Learning Rate (max) value is setup to lr, and all previous parameters groups ones to lr/10.
Train the very first layers at a Learning Rate of 1e-5, the very last at 1e-3, and the Learning Rates of other parameters groups are evenly geometrically spaced between theses two values.
Train the very first layers at a Learning Rate of 1e-5, the very last at 1e-3, and the Learning Rates of other parameters groups are evenly linearly spaced between theses two values or you can pass as a list any Learning Rate values.

WARNING

Points 3 and 4 are not equivalent for a number of parameters groups greater than 3!!!

point 3: Learning Rates are calculated geometrically.
point 4: you can pass an array with the Learning Rate values you want.

List of Learning Rates: last values of cosine annealing for the 1cycle policy

In order to get the list of Learning Rates by parameters group that was passed to the Learner by the Optimizer during the training, you can display the hyperparameters by using learn.opt.hypers as following:

for i,h in enumerate(learn.opt.hypers):
    print(i,h)