What does 1-cycle do with AdamW since there is no momentum variable?

Hello :slight_smile:

I understand what the 1-cycle policy does with SGD but how does this graph translates to Adam since there is no mometum ?

Or am I missing something ?
Thank you !

In Adam, the equivalent of momentum is the parameter named beta1, so it’s the on that is changed.

2 Likes

hello, sgugger, what does the ‘moms’ mean in fit_one_cycle? Does that means the parameter of momentum? Btw, I can’t find one access to get a better doc for our fastai lib, only api signatures or source codes, with little code comment…:face_with_monocle::worried:

def fit_one_cycle(learn:Learner, cyc_len:int, max_lr:Union[Floats,slice]=defaults.lr,
              moms:Tuple[float,float]=(0.95,0.85), div_factor:float=25., pct_start:float=0.3, final_div:float=None,
              wd:float=None, callbacks:Optional[CallbackList]=None, tot_epochs:int=None, start_epoch:int=None)->None: