Is the moms=(0.95, 0.85)
parameter to e.g. fit_one_cycle
equivalent to the Adam optimizer’s (beta_2, beta_1)
(notice the order) parameters, where beta_1
is the decay rate for the first moment, and beta_2
for the second?
did you find an answer to this?
I’d also like to know. The graph shows one cycle, as stated below from the docs:
https://docs.fast.ai/callbacks.one_cycle.html
the learning rates goes from
lr_max/div_factor
tolr_max
linearly while the momentum goes frommoms[0]
tomoms[1]
linearly. In phase 2, the learning rates follows a cosine annealing fromlr_max
to 0, as the momentum goes frommoms[1]
tomoms[0]
with the same annealing.
So it sounds like it’s changing one momentum parameter only, not both in Adam?
The momentum is the first beta in Adam (or the momentum in SGD/RMSProp). When you pass along (0.95,0.85) it means going from 0.95 to 0.85 during the warmup then from 0.85 to 0.95 in the annealing, but it only changes the first beta in Adam, yes.
Great. Thanks for the clarification.