Clarifiation on `moms` (momentum) parameter

Is the moms=(0.95, 0.85) parameter to e.g. fit_one_cycle equivalent to the Adam optimizer’s (beta_2, beta_1) (notice the order) parameters, where beta_1 is the decay rate for the first moment, and beta_2 for the second?


did you find an answer to this?

I’d also like to know. The graph shows one cycle, as stated below from the docs:

the learning rates goes from lr_max/div_factor to lr_max linearly while the momentum goes from moms[0] to moms[1] linearly. In phase 2, the learning rates follows a cosine annealing from lr_max to 0, as the momentum goes from moms[1] to moms[0] with the same annealing.

So it sounds like it’s changing one momentum parameter only, not both in Adam?

The momentum is the first beta in Adam (or the momentum in SGD/RMSProp). When you pass along (0.95,0.85) it means going from 0.95 to 0.85 during the warmup then from 0.85 to 0.95 in the annealing, but it only changes the first beta in Adam, yes.


Great. Thanks for the clarification.