Lesson 7: BnLayer - using mean/std of last training batch for all inference?

In the BnLayer module created for Lesson 7 @jeremy mentioned that the ‘full’ implementations of BatchNorm use an exponentially weighted moving average for the means and standard deviations used to normalise the mini-batch but that the implementation in BnLayer just uses the mean/std for the mini-batch itself. (as a ‘cheat’)

Does this not mean that with the BnLayer implementation, at inference (validation/test) time the mean/std for the LAST mini-batch trained is used and is there not a risk that that last batch might not be representative of the overall distribution of the data?