The OneCycle learning rate scheduler interpolates on a range from a minimum to maximum lr. Here we set it using the range function slice although you can also set one value like so:
learn.fit_one_cycle(1, 1e-2)
and it will interpolate starting from a lower value e.g. 1e-2/10 to 1e-2. The number 2.6^4 is used to set the starting lr to be even lower.
Thanks! So I understand that the 1e-2 in the equation is left there to make it clearer that we’re going from a fraction of the max LR up to the max LR.
But why is the denominator in the form ‘2.6 to the power of 4’ rather than just putting 46 in the denominator?
In the context of learning rates, 2.6^4 is likely chosen based on empirical experimentation or specific tuning for the model and dataset in question. The value represents a scaling factor that helps achieve a balance between convergence speed and stability during training.
Using a value like 2.6^4 allows for a smoother adjustment of the learning rate, potentially leading to better performance. The choice is often guided by the characteristics of the optimization landscape and the particular architecture being used.