Hello
I understand what the 1-cycle policy does with SGD but how does this graph translates to Adam since there is no mometum ?
Or am I missing something ?
Thank you !
Hello
I understand what the 1-cycle policy does with SGD but how does this graph translates to Adam since there is no mometum ?
Or am I missing something ?
Thank you !
In Adam, the equivalent of momentum is the parameter named beta1, so it’s the on that is changed.
hello, sgugger, what does the ‘moms’ mean in fit_one_cycle? Does that means the parameter of momentum? Btw, I can’t find one access to get a better doc for our fastai lib, only api signatures or source codes, with little code comment…
def fit_one_cycle(learn:Learner, cyc_len:int, max_lr:Union[Floats,slice]=defaults.lr,
moms:Tuple[float,float]=(0.95,0.85), div_factor:float=25., pct_start:float=0.3, final_div:float=None,
wd:float=None, callbacks:Optional[CallbackList]=None, tot_epochs:int=None, start_epoch:int=None)->None: