Lesson 17 official topic

Senzen · August 25, 2024, 3:10pm

why are we using some predefined value for batch transform and not calculating xb.mean() and xb.std() before ever batch.

MoetasimR97 · March 29, 2025, 3:58pm

Yess! Finally someone noticed this bit. I wrote the norm function but with the mean and the standard deviation of the batch itself, but the performance dropped significantly. Super odd.

amitness · April 24, 2025, 10:54am

For the code of Adam optimizer, shouldn’t the sqrt be applied to unbias_sqr_avg and then added to self.eps?

The original Adam paper does it like that, so I got confused.