Hi All
I am trying to understand below things about the Running batchnorm impl
- In his explaination jeremy mentioned we use Ex-E[x^2] formula as variance is not constant and batch size vary too.
I m not able to understand why would the bs vary except may be for last one or so . making bold the confusing part
Below is chat transcript while he was explaining the detail 1
“”"
we take the running average of variants
138:14
but you can’t take the running average
138:16
of areas it doesn’t make sense to take
138:19
the running average of variants it’s a
138:20
variance you know you can’t just average
138:23
a bunch of variances in particularly
138:26
because they might even be different
138:28
batch sizes right because batch size
138:30
isn’t necessarily constant right instead
138:32
as we learnt earlier in the class the
138:36
way that we want to calculate variance
138:39
is like this sum of two values of a mean
138:45
of x squared minus mean of X""""
- Similarly in detail 2
he said about keeping track of batch size ,i m unable to understand bold line in the chat transcript below because X.numel should always return size as bs we pass.
" that we have to be careful of detail
139:44
number two is that the batch size could
139:47
vary from from any batch to mini batch
139:50
so we should also register a buffer for
139:54
count and take an exponentially weighted
139:57
moving average of the counts of the
139:59
batch sizes"
- In the below peace of code ,what is purpose of new_Tensor .
self.count.new_tensor(x.numel()/nc)
What is the .new_tensor and numel here ,what is the purpose ?
I am stuck in this lesson as unable to understand these things…