In lesson 6 ,@Jeremy said that:
"In practice,we don’t use different mean and deviation for every mini-batch and If we did,it would vary so much,“it would be very hard to train”, instead we take exponentially average of mean and deviation.
I am so confused about the word “train”.
This sounds so strange to me , isn’t it supposed to be some noise as regularization during training?
And I understand we have to take exponentially average of mean and deviation when testing.
But not at training,right? I go through many libraries and haven’t seen ant of it take moving average to do batch normalization at training stage.
Can someone correct me if I am wrong,Thanks!