Lesson 1 // Training for single epoch first, and then as much epochs as we want?

Why do we (FIRSTLY) unfreeze the head with randomly initialized params while training for a single epoch, and then train whole network as much epochs as we want?

Is the head ‘last layer’?

Hi Lobvh,

When we call fine_tune, the last layer of the pre-trained model, commonly called the ‘head’, whose dimensions are specific to the model’s original task, is replaced with a layer that has dimensions specific to our task.

The newly added layer has random weights. Therefore, we dedicate one epoch to training just those random weights, while leaving the weights in the preceding layers of the network alone. We say that those preceding layers are ‘frozen’.

The reasoning behind this, is that the weights in the preceding layers have already been trained on millions of images (in the case of resnet, for example) and can recognize some basic patterns, which are likely to be useful to our task. Therefore, we don’t want to train them as much as completely random weights in our newly added layer.

After that one epoch, we then ‘unfreeze’ the preceding layers, and train the weights in every layer of the network for as many epochs as you specify when you call fine_tune.


But, if it already has ‘random weights’ then why do we train for one epoch to randomize it another time? :smiley: