How is batch norm initialized?

cooli46 · March 4, 2019, 2:40am

Hello,

in function create_cnn, there is function called apply_init, which “Initialize all non-batchnorm layers of m with init_func.”

But how is batchnorm layers are initialized? I did not find anything in the code any clues yet…

Anyone has any ideas?

Thanks,

Regards,
liwei

youali · March 4, 2019, 2:07pm

Hi,

I think in the beginning you can start by simply having BatchNorm with a mean of zero and a variance of one:

for example:

    for module in model.modules():
        if isinstance(module, nn.BatchNorm2d):
            module.weight.data.fill_(1)
            module.bias.data.zero_()

sgugger · March 4, 2019, 3:05pm

We are relying on pytorhc init of BatchNorm: from the source code

def reset_running_stats(self):
        if self.track_running_stats:
            self.running_mean.zero_()
            self.running_var.fill_(1)
            self.num_batches_tracked.zero_()

    def reset_parameters(self):
        self.reset_running_stats()
        if self.affine:
            init.uniform_(self.weight)
            init.zeros_(self.bias)

So the running mean is initialized to 0, the running variance to 1, then the weight to uniform (probably between -1/sqrt(n_weights) and 1/sqrt(n_weights)) and the bias to 0.

cooli46 · March 5, 2019, 1:29am

Hi all,

Thanks for the reply. It really helps. fastai is a great work, and the forum is great community. Cheers.

Regards,