I use resnet34 with data augmentation. Firstly I train it with frozen weights then unfreeze weights and continue training with differential learning rates. Why does it start t overfit so much after unfreezing weights? Am I doing something wrong?
I’m reopening this because I’ve seen the same.
But why does overfitting occure if we use a learning rate that is “too high”?
e.g. I tried differential learning rates of [1e-5,1e-3,1e-2] instead of the “normal” [1e-4,1e-3,1e-2] and got a accuracy improvement of about 1%. What is the intuition behind it?
Intuitively: The first few layers are already well trained (pretrained) to recognize abstract features. Yet if you allow them to be trained with too high a learning rate, they will home in on the specific things in your train set, and that’s not what we want; the first few layers should still just keep recognizing abstract high level features (edges, corners, circles, and then eyes, noses, …). The main learning happens in the final few layers where we’re putting it all together to tell one dog from another dog.