[Solved] How to apply layer-wise learning rates (discriminative lr)?

Hi all, How can I make this work ?

I have

len(hf_electra_param_splitter(model)) == 14
lrs = get_layer_lrs(...)
len(lrs) == 14

I did

learn = Learner(
...
splitter=hf_electra_param_splitter,
lr=lrs,
)
learn = learn.fit_one_cycle(n_epoch=3)

and get

/content/fastai2/fastai2/optimizer.py in set_hypers(self, **kwargs)
     32 
     33     def unfreeze(self): self.freeze_to(0)
---> 34     def set_hypers(self, **kwargs): L(kwargs.items()).starmap(self.set_hyper)
     35     def _set_hyper(self, k, v):
     36         for v_,h in zip(v, self.hypers): h[k] = v_

....

/content/fastai2/fastai2/optimizer.py in set_hyper(self, k, v)
     42         v = L(v, use_list=None)
     43         if len(v)==1: v = v*len(self.param_lists)
---> 44         assert len(v) == len(self.hypers), f"Trying to set {len(v)} values for {k} but there are {len(self.param_lists)} parameter groups."
     45         self._set_hyper(k, v)
     46 

AssertionError: Trying to set 14 values for lr but there are 1 parameter groups.

I’d first check your number of layer groups after splitting. The quick way is to do a learn.freeze() followed by .summary() and see how many are frozen. But how are you defining your split? IE what does it look like?

I’d like to have look at your splitter function logic. Have you mapped them to params like so

L([...],[...],[...]).map(params)

Thanks for your replies, here is my src code

# Names come from, for nm in model.named_modules(): print(nm[0])
def hf_electra_param_splitter(model, num_hidden_layers):
  names = ['0.model.embeddings', *[f'0.model.encoder.layer.{i}' for i in range(num_hidden_layers)], '1']
  groups = [ mod.parameters() for name, mod in model.named_modules() if name in names]
  return groups

def get_layer_lrs(lr, lr_decay, num_hidden_layers):
  # I think input layer as bottom and output layer as top, which is different from official repo 
  return [ lr * (lr_decay ** depth) for depth in reversed(range(num_hidden_layers+2))]
learn = Learner(glue_dls['mrpc'], single_task_model,
                            loss_func=CrossEntropyLossFlat(), 
                            opt_func=partial(Adam, eps=1e-6,),
                            metrics=[F1Score(), Accuracy()],
                            splitter=partial(hf_electra_param_splitter, num_hidden_layers=base_model.config.num_hidden_layers),
                            lr=get_layer_lrs(3e-4,0.8,base_model.config.num_hidden_layers),
                            ).to_fp16()
learn.fit_one_cycle(n_epoch=3)

additionaly,

ps = hf_electra_param_splitter(single_task_model,12)
print([ type(p) for p in ps])
print(len(ps))
lrs = get_layer_lrs(3e-4, 0.8, 12)
print(L(lrs))
[<class 'generator'>, <class 'generator'>, <class 'generator'>, <class 'generator'>, <class 'generator'>, <class 'generator'>, <class 'generator'>, <class 'generator'>, <class 'generator'>, <class 'generator'>, <class 'generator'>, <class 'generator'>, <class 'generator'>, <class 'generator'>]
14
(#14) [1.649267441664001e-05,2.061584302080001e-05,2.5769803776000012e-05,3.221225472000001e-05,4.026531840000002e-05,5.033164800000002e-05,6.291456000000001e-05,7.864320000000003e-05,9.830400000000001e-05,0.00012288000000000002...]

I suppose you should have 14 lists instead of just one. Try doing

groups = [[mod.parameters()] for name, mod in model.named_modules() if name in names]

If you want to have exact one layer in each layer group

I try the change to get 14 lists instead of generators.
But I get another message.

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

<ipython-input-32-b586e752124c> in <module>()
----> 1 single_task_learn.fit_one_cycle(n_epoch=3)

...

/content/fastai2/fastai2/optimizer.py in step(self)
     80 
     81     def step(self):
---> 82         for p,pg,state,hyper in self.all_params(with_grad=True):
     83             for cb in self.cbs: state = _update(state, cb(p, **{**state, **hyper}))
     84             self.state[p] = state

/content/fastai2/fastai2/optimizer.py in all_params(self, n, with_grad)
     14     def all_params(self, n=slice(None), with_grad=False):
     15         res = L((p,pg,self.state[p],hyper) for pg,hyper in zip(self.param_lists[n],self.hypers[n]) for p in pg)
---> 16         return L(o for o in res if o[0].grad is not None) if with_grad else res
     17 
     18     def _set_require_grad(self, rg, p,pg,state,h): p.requires_grad_(rg or state.get('force_train', False))

...

/content/fastai2/fastai2/optimizer.py in <genexpr>(.0)
     14     def all_params(self, n=slice(None), with_grad=False):
     15         res = L((p,pg,self.state[p],hyper) for pg,hyper in zip(self.param_lists[n],self.hypers[n]) for p in pg)
---> 16         return L(o for o in res if o[0].grad is not None) if with_grad else res
     17 
     18     def _set_require_grad(self, rg, p,pg,state,h): p.requires_grad_(rg or state.get('force_train', False))

AttributeError: 'generator' object has no attribute 'grad'

Hi, thanks to @kshitijpatil09 's advice,
I tried

groups = [ list(mod.parameters()) for name, mod in model.named_modules() if name in names]

and I got
image
higher than not using discriminative lr

So much thank you !! :heart_eyes: :heart_eyes: :heart_eyes:

4 Likes

Glad it helped !

1 Like