How is base_lr set in fine_tune function if not set manually?

In fast book(

fine_tune is used without setting a base learning rate.

and quote the words below

It will automatically change learning rates appropriately for most datasets.

My qustion is how is the learning rate set according to different datasets?

I tried to read the doc of finetune, it seems default set to 2e-3 if not set manually.
Does anyone how does this work?


I think the defaults are just empirical best guesses. If you do learn.fine_tune?? in Jupyter Notebook it should show you the code, with the defaults and logics.

If in doubt, do a learn.lr_find first, set your lr accordingly, and experiment with different values.


1 Like

Hi Yijin,

Maybe I was not clear before, I have read the code for fine_tune.

def fine_tune(self:Learner, epochs, base_lr=2e-3, freeze_epochs=1, lr_mult=100,
              pct_start=0.3, div=5.0, **kwargs):
    "Fine tune with `freeze` for `freeze_epochs` then with `unfreeze` from `epochs` using discriminative LR"
    self.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs)
    base_lr /= 2
    self.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div, **kwargs)

So it’s setting base_lr=2e-3 by default. That’s what exactly I found confusing, because it doesn’t match this quote from fastbook

It will automatically change learning rates appropriately for most datasets.

As you suggested, I could use lr_find, however i wanted to know how is base_lr chosen under the hook of the library.

Hope this clarifies. :slight_smile:

It actually does. That simply means that base lr seemed to do a good enough job on most datasets they toyed with :slight_smile: (empirical studies)

1 Like

I think “it will automatically change” does not mean the learning rate is automatically selected (because as you noticed, it’s always defaulting to 2e-3). Instead, it is related to how the learning rate is changing in fine_tune - it gets divided by two after unfreezing, and the discriminative learning rate is adjusted to train the first layers slower (division by lr_mult) :slight_smile: