Lesson 17 official topic


why are we using some predefined value for batch transform and not calculating xb.mean() and xb.std() before ever batch.

1 Like

Yess! Finally someone noticed this bit. I wrote the norm function but with the mean and the standard deviation of the batch itself, but the performance dropped significantly. Super odd.

For the code of Adam optimizer, shouldn’t the sqrt be applied to unbias_sqr_avg and then added to self.eps?

The original Adam paper does it like that, so I got confused.