I am trying to understand below things about the Running batchnorm impl
- In his explaination jeremy mentioned we use Ex-E[x^2] formula as variance is not constant and batch size vary too.
I m not able to understand why would the bs vary except may be for last one or so . making bold the confusing part
Below is chat transcript while he was explaining the detail 1
we take the running average of variants
but you can’t take the running average
of areas it doesn’t make sense to take
the running average of variants it’s a
variance you know you can’t just average
a bunch of variances in particularly
because they might even be different
batch sizes right because batch size
isn’t necessarily constant right instead
as we learnt earlier in the class the
way that we want to calculate variance
is like this sum of two values of a mean
of x squared minus mean of X""""
- Similarly in detail 2
he said about keeping track of batch size ,i m unable to understand bold line in the chat transcript below because X.numel should always return size as bs we pass.
" that we have to be careful of detail
number two is that the batch size could
vary from from any batch to mini batch
so we should also register a buffer for
count and take an exponentially weighted
moving average of the counts of the
- In the below peace of code ,what is purpose of new_Tensor .
What is the .new_tensor and numel here ,what is the purpose ?
I am stuck in this lesson as unable to understand these things…