Lesson 6 - Official topic

farid · May 3, 2020, 2:32pm

@DanielLam, @jcatanza, The short answer is: by unfreezing batchnorm our model get a better accuracy.

Now the why:
When we use a pretrained model, the batchnorm layer contains the mean, the standard deviation, the gamma and beta (2 trainable parameters) of the pretrained dataset (ImageNet in the case of images).

If we freeze our batchnorm layer with our dataset, we are feeding our model with our data (our images) and normalizing our batch with ImageNet mean, standard deviation, gamma and beta: Those values are off specially if our images are different from the ImageNet images. Therefore, your normalized activations are also off which leads to less than optimal results.

We keep the batchnorm layer unfrozen because while we are training the model and for each batch we will calculate the mean, the standard deviation of the activations of our data (batch of images), and updating (training) the corresponding gamma and beta, and using those results to normalize our activations of the current batch: The normalized activations are therefore more aligned with our images (dataset) as opposed to those obtained with a frozen batchnorm.