Why are BatchNorm layers set to trainable in a frozen model

addamit · May 23, 2019, 9:43pm

Someone asked this question already but I could not find an intuitive explanation. Hence thought of opening a new topic thread.

here is the link to the original post
https://forums.fast.ai/t/why-is-it-the-batchnorm2d-layers-in-a-frozen-model-trainable/38944/2

My understanding is by default when we create a learner object from a model and data bunch the underlying network architecture’s layers are frozen except the custom layer that is added for the specific classification problem.

learn = cnn_learner(data, models.resnet34, metrics=error_rate)

However when I print summary on the learner, noticed the batch norm layers are all set to be trainable. Meaning the weights will get updated even when the model is frozen and only the last few layers are supposed to have its weights updated. 
Can anybody explain the intuition on this please?

Many thanks
Amit

heye0507 · May 25, 2019, 5:17am

Jeremy explained this in 2019 part 2.

Here is my understanding.

During transfer learning, first thing we did in fastai is freeze the backbone and only train the custom head.
In fastai, usually AdaptiveConcatPooling follow by some linear, bn, drop out layers…

Step 2, unfreeze the backbone, and train the whole thing.

Now in step 1, if BN is frozen, it will still be the mean, std and parameters (gamma, beta) for the previous model (for example, ImageNet pics). But now you are using your task specific data, it is not the ImageNet pictures. Therefore, you want to train the BN to have your mean, std and parameters for your data.

You can check this notebook

github.com

fastai/fastai_docs/blob/master/dev_course/dl2/11a_transfer_learning.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [

This file has been truncated. show original

For section labeled ‘Batch norm transfer’

If you train the model first with bn freeze, you will see in step 2 the loss didn’t go down much.

I think Jeremy did much better than me… so just wait for part 2 I guess

addamit · May 25, 2019, 11:36am

Thank you @heye0507. That makes sense.