Why are BatchNorm layers set to trainable in a frozen model

Jeremy explained this in 2019 part 2.

Here is my understanding.

During transfer learning, first thing we did in fastai is freeze the backbone and only train the custom head.
In fastai, usually AdaptiveConcatPooling follow by some linear, bn, drop out layers…

Step 2, unfreeze the backbone, and train the whole thing.

Now in step 1, if BN is frozen, it will still be the mean, std and parameters (gamma, beta) for the previous model (for example, ImageNet pics). But now you are using your task specific data, it is not the ImageNet pictures. Therefore, you want to train the BN to have your mean, std and parameters for your data.

You can check this notebook

For section labeled ‘Batch norm transfer’

If you train the model first with bn freeze, you will see in step 2 the loss didn’t go down much.

I think Jeremy did much better than me… so just wait for part 2 I guess :rofl:

5 Likes