# ULMFit Paper: Question on learning rates (section 3.2)

Regarding …

“first choose the learning rate n(L) of the last layer by fine-tuning only the last layer and using n(l-1) = n(l)/2.6 as the learning rate for lower layers”

How is this implemented in fast.ai speak where the LM `learner` where we can apply different learning rates by layer group (of which there is 4)? Would it be something like:

``````lr = 0.004
learner.fit([lr/2.6*3, lr/2.6*2, lr/2.6, lr], use_clr_beta=(10,10,0.95,0.85), cycle_len=15)
``````
2 Likes

For LM, there’s an example in the `imdb.ipynb` from Part 2:

``````In :

lr=3e-3
lrm = 2.6
lrs = np.array([lr/(lrm**4), lr/(lrm**3), lr/(lrm**2), lr/lrm, lr])
``````

So it should be power of 2 instead of multiply of 2.

By the way, did anyone try using this new discriminative rate setup on vision task?

1 Like

Good catch!

But this is done for the classifier whereas the paper seems to indicate that this should be how we define the learning rates for fine-tuning the language model. Perhaps I’m reading the paper wrong??? Or perhaps the notebooks aren’t updated to reflect the paper???

There is LM fine-tuning and Classifier fine-tuning … so I guess I’m confused about the discriminative learning rates that should be applied to each insofar as my reading of the paper goes.

Correct me if I’m wrong, I thought we are using LM as backbone to do classification with custom head.

You can see the actual code from the paper in the `dl2/imdb_scripts` folder.

2 Likes

Thanks for the link! Nice to look at the code.

I’m looking at `train_tri_lm.py` but I still don’t see where you “first choose the learning rate of the last layer by fine-tuning only the last layer”.

I see where if you are training with discriminative lrs you set `lrs = np.array([lr/6,lr/3,lr,lr/2])`, but insofar as training only the last layer first, I don’t see it. If I’m reading the code right, it looks like you guys train all the layers initially (see line 130: `learner.unfreeze()`).

There is code that looks to do what the paper describes, but its commented out (see line 123):

``````#learner.freeze_to(-1)
#learner.fit(lrs, 1, wds=wd, use_clr=(6,4), cycle_len=1)
``````

So I’m not sure what to make of this.

Now I understand what’s your problem. It seems `train_tri_lm.py` use a different way to do discriminative learning rate and I don’t understand the logic behind as well.

Also, is there a list of different argument values you passed to `def train_lm()`?