Retraining multiple layers

dmhv · February 2, 2017, 10:25am

Hello everybody.

Found a place in the lesson2 notebook that I don’t quite understand:

NB: Don’t skip the step of fine-tuning just the final layer first, since otherwise you’ll have one layer with random weights, which will cause the other layers to quickly move a long way from their optimized imagenet weights.

Does that mean that in case we want to retrain multiple layers, we should do it sequentially, starting from the last one?

Thanks in advance for clarifying.

Gelu74 · February 2, 2017, 10:32am

@dmhv Yes, you should train first only the layer you’ve modified from the original model, then retrain the rest if you wish

dmhv · February 2, 2017, 11:14am

So if I want to retrain 3 layers (denote L1-L3, from top to bottom), that would mean going through three steps:

freezing L1 and L2 (and all the layers above these, obviously), fitting L3;
un-freezing L2, fitting L2 and L3;
un-freezing L1, fitting L1, L2 and L3.

Is that so? I can understand the idea of doing this to avoid random initialization. However, if the problem is sufficiently far from what the net was originally trained for, would there be a notable difference in final performance and convergence speed? Maybe someone has already done that for state farm and could share their findings?

Gelu74 · February 2, 2017, 12:11pm

@dmhv I am not completely sure but I’d say that:
-Imagine you start with a model that has L1->L2->L3 and you want to use the precalculated weights but with a different final layer.
1)You replace L3 with L4, so your model is L1->L2->L4. Freeze L1 and L2 (andl all layers above), fit L3
2)Unfreeze L2 and L1 and fit L1,L2,L3
The reason for 1) is that you want to preserve the weights of the pretrained model and random initializing L3 and fitting everything would move the other layers too much.
I can not see any reason for your step 2. but I might be mistaken, maybe give a try to the two alternatives and report back.

sakiran · February 3, 2017, 4:36am

From my understanding, I thought the following is the way to do this.

Freeze everything but the last layer, change the total no of classes you want etc and train.
Assuming we want to re-train only three layers, un-freeze the other two layers and retrain with the existing weights.

Kindly let me know if I’m wrong.

dhatawesomedude · February 3, 2017, 10:42am

I have a similar question here. For me, my question is how do you know what layers to retrain? From my understanding in the video the last layer was replaced because it performed a similar function to the additional layer. Does it work the same way, if you’re adding more than one layer?
I am not done with the course yet so I assume this question may be answered later on in the course.

sakiran · February 3, 2017, 10:48am

I think this was answered in one of the videos. If the targets (cats, dogs etc) that you are trying to classify are already present in the imagenet challenge then training the last layer would suffice.
If they are not there, you should retrain previous layer. The extent that one must go again depends on the dataset.
So it is more of trail and error thing.

dhatawesomedude · February 3, 2017, 11:32am

@sakiran. Yes. I remember that. I just couldn’t imagine that I would have targets that wouldn’t be in that last layer of the imagenet network.