Understanding Running Batch Norm Deeper

Hi All
I am trying to understand below things about the Running batchnorm impl

  1. In his explaination jeremy mentioned we use Ex-E[x^2] formula as variance is not constant and batch size vary too.
    I m not able to understand why would the bs vary except may be for last one or so . making bold the confusing part
    Below is chat transcript while he was explaining the detail 1
    “”"
    we take the running average of variants

138:14

but you can’t take the running average

138:16

of areas it doesn’t make sense to take

138:19

the running average of variants it’s a

138:20

variance you know you can’t just average

138:23

a bunch of variances in particularly

138:26

because they might even be different

138:28

batch sizes right because batch size

138:30

isn’t necessarily constant right instead

138:32

as we learnt earlier in the class the

138:36

way that we want to calculate variance

138:39

is like this sum of two values of a mean

138:45

of x squared minus mean of X""""

  1. Similarly in detail 2
    he said about keeping track of batch size ,i m unable to understand bold line in the chat transcript below because X.numel should always return size as bs we pass.

" that we have to be careful of detail

139:44

number two is that the batch size could

139:47

vary from from any batch to mini batch

139:50

so we should also register a buffer for

139:54

count and take an exponentially weighted

139:57

moving average of the counts of the

139:59

batch sizes"

  1. In the below peace of code ,what is purpose of new_Tensor .
    self.count.new_tensor(x.numel()/nc)

What is the .new_tensor and numel here ,what is the purpose ?

I am stuck in this lesson as unable to understand these things…

Hi Jaideep,

For your point 3, new_tensor() docs says:

By default, the returned Tensor has the same torch.dtype and torch.device` as this tensor.

So I believe that

`c = self.count.new_tensor(x.numel()/nc)`

is a generic line which, in the specific case where self.counts dtype is Float and self.count device is the 1st GPU, is equivalent to

`c = tensor(x.numel()/nc).float().cuda()`