Discriminative LRs and layer groups

Hiho!
I’m trying to use discriminative learning rates for a progressively growing auto encoder.
That means I’m basically growing an auto encoder from the inside out, starting with low res images up to high res. Now I want to reduce the learning rates for the inner layers once I increase my resolution (and thus add more convs in front of the encoder and at the end of the decoder).

As far as I understand the implementation the following things happen:

  1. A learners learn.layer_groups is used as a base to determine parts of the networks that have the same lr. Thus, there will be len(learn.layer_groups) different learning rates.
  2. Learner.lr_range() is what sets the learn.lr and it basically makes an np.array with the same length as layer_groups. I’m assuming that the LRs are then mapped onto those like zip(layer_groups, lr)

The thing I did not find yet is where exactly the optimizer gets to know of all this.

My question now is: If I want to change the discriminative LR behavior to what I describe at the top, is it sufficient to change the Learner.lr_range() function to return an array that correctly maps my learning rates? And how can I best confirm that all the layers are actually trained with the correct learning rate then?

Greetings,
Dome

1 Like

You can check it like this to see the optimizer and all the groups with their learning rates:

learn.create_opt(learn.lr_range(slice(1e-6,1e-4)))
learn.opt

OptimWrapper over Adam (
Parameter Group 0
amsgrad: False
betas: (0.9, 0.99)
eps: 1e-08
lr: 1e-06
weight_decay: 0

Parameter Group 1
amsgrad: False
betas: (0.9, 0.99)
eps: 1e-08
lr: 1e-06
weight_decay: 0

Parameter Group 2
amsgrad: False
betas: (0.9, 0.99)
eps: 1e-08
lr: 9.999999999999999e-06
weight_decay: 0

Parameter Group 3
amsgrad: False
betas: (0.9, 0.99)
eps: 1e-08
lr: 9.999999999999999e-06
weight_decay: 0

Parameter Group 4
amsgrad: False
betas: (0.9, 0.99)
eps: 1e-08
lr: 9.999999999999999e-05
weight_decay: 0

Parameter Group 5
amsgrad: False
betas: (0.9, 0.99)
eps: 1e-08
lr: 9.999999999999999e-05
weight_decay: 0
).
True weight decay: True

1 Like

Oh, forgot to answer :smiley:
That was basically what I was looking for, thanks
For people landing here with the same question:
I derived Learner and overrode lr_range() as well as the layer_groups property to return layer groups according to my progressive growing (probably not necessary if your architecture does not change)