Chapter 10 NLP discriminative learning rates

MikeGallimore · January 13, 2025, 7:19pm

Hi,

Just a quick question (I think!)

In chapter 10 NLP, page 349 of the book, there’s the line:

learn.fit_one_cycle(1, slice(1e-2/(2.6^4),1e-2))

I am trying to understand why the number 2.6^4 was chosen over say, 46, or 8.5e-22 (which would be the result of calculating (1e-2/(2.6^4))

My guess is that it makes more intuitive changes to the scaling, if our changes have an inverse quadratic effect.

Does anybody know the reason?

Thanks, Mike.

Zero_One · January 14, 2025, 7:25pm

Unlock

Zero_One · January 14, 2025, 7:28pm

Excellent

poppingtonic · January 15, 2025, 9:16am

The OneCycle learning rate scheduler interpolates on a range from a minimum to maximum lr. Here we set it using the range function slice although you can also set one value like so:

learn.fit_one_cycle(1, 1e-2)

and it will interpolate starting from a lower value e.g. 1e-2/10 to 1e-2. The number 2.6^4 is used to set the starting lr to be even lower.

MikeGallimore · January 15, 2025, 6:05pm

Thanks! So I understand that the 1e-2 in the equation is left there to make it clearer that we’re going from a fraction of the max LR up to the max LR.

But why is the denominator in the form ‘2.6 to the power of 4’ rather than just putting 46 in the denominator?

MikeGallimore · January 15, 2025, 6:07pm

Perhaps it was a typo and they meant to put 2.6e-4, but it just worked anyway.

jerrysosin · January 23, 2025, 1:35pm

In the context of learning rates, 2.6^4 is likely chosen based on empirical experimentation or specific tuning for the model and dataset in question. The value represents a scaling factor that helps achieve a balance between convergence speed and stability during training.

Using a value like 2.6^4 allows for a smoother adjustment of the learning rate, potentially leading to better performance. The choice is often guided by the characteristics of the optimization landscape and the particular architecture being used.

jerrysosin · March 1, 2025, 6:08am

thai tea