moms=(0.95, 0.85) parameter to e.g.
fit_one_cycle equivalent to the Adam optimizer’s
(beta_2, beta_1) (notice the order) parameters, where
beta_1 is the decay rate for the first moment, and
beta_2 for the second?
did you find an answer to this?
I’d also like to know. The graph shows one cycle, as stated below from the docs:
the learning rates goes from
lr_maxlinearly while the momentum goes from
momslinearly. In phase 2, the learning rates follows a cosine annealing from
lr_maxto 0, as the momentum goes from
momswith the same annealing.
So it sounds like it’s changing one momentum parameter only, not both in Adam?
The momentum is the first beta in Adam (or the momentum in SGD/RMSProp). When you pass along (0.95,0.85) it means going from 0.95 to 0.85 during the warmup then from 0.85 to 0.95 in the annealing, but it only changes the first beta in Adam, yes.
Great. Thanks for the clarification.