Hi all,
I just found a paper by Jason Yosinski called “How transferable are features in deep neural networks?” (https://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks) and right on the second page he explains transfer learning and says this about fine-tuning:
If the target dataset is small and the number of parameters is large, fine-tuning may result in overfitting, so the features are often left frozen. On the other hand, if the target dataset is large or the number of parameters is small, so that overfitting is not a problem, then the base features can be fine-tuned to the new task to improve performance.
I’ve been following Fast.ai protocol to train the model a little bit and the unfreeze and train a little bit more. But the thing is, I’m trying to train a model that DOES overfit during fine-tuning. The dataset is quite smaller than the ones I used before (just over 2K images). I tried different ResNet archs (e.g., 18, 34 and 50) and also different weight decays but the result is the same; the loss diverge very fast while fine-tuning and generalization drops. So now I’m curious about the scenario where I only train the model once keeping the first layers frozen.
Has anybody done that before with Fast.ai?