Fine-tuning in transfer learning

Hi all,
I just found a paper by Jason Yosinski called “How transferable are features in deep neural networks?” ( and right on the second page he explains transfer learning and says this about fine-tuning:

If the target dataset is small and the number of parameters is large, fine-tuning may result in overfitting, so the features are often left frozen. On the other hand, if the target dataset is large or the number of parameters is small, so that overfitting is not a problem, then the base features can be fine-tuned to the new task to improve performance.

I’ve been following protocol to train the model a little bit and the unfreeze and train a little bit more. But the thing is, I’m trying to train a model that DOES overfit during fine-tuning. The dataset is quite smaller than the ones I used before (just over 2K images). I tried different ResNet archs (e.g., 18, 34 and 50) and also different weight decays but the result is the same; the loss diverge very fast while fine-tuning and generalization drops. So now I’m curious about the scenario where I only train the model once keeping the first layers frozen.

Has anybody done that before with


Interesting topic. I want to hear the solution too because I also have this problem. But I have heard people had their solution, for example the dog breed problem that only has 70 images total or each.

We’ve done this on occasion with a keypoint model, where sometimes unfreezing the model hinders the model performance. Ideally this shouldn’t happen obviously