(confusion) Momentum that Jeremy mentioned in Lesson 6

In lesson 6 ,@Jeremy said that:
"In practice,we don’t use different mean and deviation for every mini-batch and If we did,it would vary so much,“it would be very hard to train”, instead we take exponentially average of mean and deviation.

I am so confused about the word “train”.

This sounds so strange to me , isn’t it supposed to be some noise as regularization during training?

And I understand we have to take exponentially average of mean and deviation when testing.

But not at training,right? I go through many libraries and haven’t seen ant of it take moving average to do batch normalization at training stage.

Can someone correct me if I am wrong,Thanks!

It starts here: