Why are BatchNorm layers set to trainable in a frozen model

Someone asked this question already but I could not find an intuitive explanation. Hence thought of opening a new topic thread.

here is the link to the original post
https://forums.fast.ai/t/why-is-it-the-batchnorm2d-layers-in-a-frozen-model-trainable/38944/2

My understanding is by default when we create a learner object from a model and data bunch the underlying network architecture’s layers are frozen except the custom layer that is added for the specific classification problem.

learn = cnn_learner(data, models.resnet34, metrics=error_rate)

However when I print summary on the learner, noticed the batch norm layers are all set to be trainable. Meaning the weights will get updated even when the model is frozen and only the last few layers are supposed to have its weights updated. 
Can anybody explain the intuition on this please?

Many thanks
Amit

1 Like

Jeremy explained this in 2019 part 2.

Here is my understanding.

During transfer learning, first thing we did in fastai is freeze the backbone and only train the custom head.
In fastai, usually AdaptiveConcatPooling follow by some linear, bn, drop out layers…

Step 2, unfreeze the backbone, and train the whole thing.

Now in step 1, if BN is frozen, it will still be the mean, std and parameters (gamma, beta) for the previous model (for example, ImageNet pics). But now you are using your task specific data, it is not the ImageNet pictures. Therefore, you want to train the BN to have your mean, std and parameters for your data.

You can check this notebook

For section labeled ‘Batch norm transfer’

If you train the model first with bn freeze, you will see in step 2 the loss didn’t go down much.

I think Jeremy did much better than me… so just wait for part 2 I guess :rofl:

5 Likes

Thank you @heye0507. That makes sense.