Discriminative LRs and layer groups

Hiho!
I’m trying to use discriminative learning rates for a progressively growing auto encoder.
That means I’m basically growing an auto encoder from the inside out, starting with low res images up to high res. Now I want to reduce the learning rates for the inner layers once I increase my resolution (and thus add more convs in front of the encoder and at the end of the decoder).

As far as I understand the implementation the following things happen:

  1. A learners learn.layer_groups is used as a base to determine parts of the networks that have the same lr. Thus, there will be len(learn.layer_groups) different learning rates.
  2. Learner.lr_range() is what sets the learn.lr and it basically makes an np.array with the same length as layer_groups. I’m assuming that the LRs are then mapped onto those like zip(layer_groups, lr)

The thing I did not find yet is where exactly the optimizer gets to know of all this.

My question now is: If I want to change the discriminative LR behavior to what I describe at the top, is it sufficient to change the Learner.lr_range() function to return an array that correctly maps my learning rates? And how can I best confirm that all the layers are actually trained with the correct learning rate then?

Greetings,
Dome

You can check it like this to see the optimizer and all the groups with their learning rates:

learn.create_opt(learn.lr_range(slice(1e-6,1e-4)))
learn.opt

OptimWrapper over Adam (
Parameter Group 0
amsgrad: False
betas: (0.9, 0.99)
eps: 1e-08
lr: 1e-06
weight_decay: 0

Parameter Group 1
amsgrad: False
betas: (0.9, 0.99)
eps: 1e-08
lr: 1e-06
weight_decay: 0

Parameter Group 2
amsgrad: False
betas: (0.9, 0.99)
eps: 1e-08
lr: 9.999999999999999e-06
weight_decay: 0

Parameter Group 3
amsgrad: False
betas: (0.9, 0.99)
eps: 1e-08
lr: 9.999999999999999e-06
weight_decay: 0

Parameter Group 4
amsgrad: False
betas: (0.9, 0.99)
eps: 1e-08
lr: 9.999999999999999e-05
weight_decay: 0

Parameter Group 5
amsgrad: False
betas: (0.9, 0.99)
eps: 1e-08
lr: 9.999999999999999e-05
weight_decay: 0
).
True weight decay: True

1 Like

Oh, forgot to answer :smiley:
That was basically what I was looking for, thanks
For people landing here with the same question:
I derived Learner and overrode lr_range() as well as the layer_groups property to return layer groups according to my progressive growing (probably not necessary if your architecture does not change)