I have been using the process described in imagenet_batchnorm and lesson 5 video to adjust the weights when I insert a batchnorm layer in a pretrained model. However this does not appear to be faster or more accurate than just inserting the batchnorm layer with default weights (1 and 0).

My understanding is that batchnorm will multiply by the mean (default 1) and add the variance (default 0). Therefore on the first pass through the model it will produce identical outputs to the existing model. It will then attempt to normalize gradually based on the learning rate; and only if the gradient suggests a lower loss. There will be no instant normalization where the outputs of the dense layer suddenly normalize and make the following layers weights invalid.

Have I misunderstood?

EDIT: OK so I see that it does actually normalize and then apply the 2 weights so that “unnormalizing” should in theory speed up training. However in practice when I apply it on dogs/cats it makes no difference whether I use 1, 0 or initialize them using the method in lesson 5. Does it make a noticeable difference on other datasets?