ND-Adam in practice


(Vitaly Bushaev) #1

Has anyone tried ND-Adam in practice ? It claims to bridge the gap between SGD and Adam, the authors of AdamW also said that it would interesting to see how AdamW combined with ND-Adam would behave in optimization, but I didn’t see much of mention of it anywhere online