Definition of adam optimizer

Have a question for the implementation of Adam:

In the adam_stpe definition, debias1 has parameters:
mom = mom, damp = 1-mom, and step
when using debias(), damp and 1-mom will be cancell off. so should we use (1 - mom**step) as debias function, which could save some calculation?


The next debias function is used for debias2 as well and over there it doesn’t cancel out so it makes sense to keep the more general function.

in debias2 call:
mom = sqr_mom
damp = 1-sqr_mom
So it is same as debias1 call, damp and 1-mom still cancells off. you may have a look.