Clarifiation on `moms` (momentum) parameter

tsoernes · August 8, 2019, 10:49am

Is the moms=(0.95, 0.85) parameter to e.g. fit_one_cycle equivalent to the Adam optimizer’s (beta_2, beta_1) (notice the order) parameters, where beta_1 is the decay rate for the first moment, and beta_2 for the second?

Rares · September 17, 2019, 9:55pm

did you find an answer to this?

Mikful · November 5, 2019, 2:32pm

I’d also like to know. The graph shows one cycle, as stated below from the docs:

https://docs.fast.ai/callbacks.one_cycle.html

the learning rates goes from lr_max/div_factor to lr_max linearly while the momentum goes from moms[0] to moms[1] linearly. In phase 2, the learning rates follows a cosine annealing from lr_max to 0, as the momentum goes from moms[1] to moms[0] with the same annealing.

So it sounds like it’s changing one momentum parameter only, not both in Adam?

sgugger · November 5, 2019, 7:30pm

The momentum is the first beta in Adam (or the momentum in SGD/RMSProp). When you pass along (0.95,0.85) it means going from 0.95 to 0.85 during the warmup then from 0.85 to 0.95 in the annealing, but it only changes the first beta in Adam, yes.

Mikful · November 5, 2019, 7:58pm

Great. Thanks for the clarification.