Why does random initialisation work better than using weights from a similar model

simoneva · January 13, 2019, 1:12pm

I have two similar datasets to model. I have applied vgg16 with frozen imagenet weights; and trained a fully connected head. This provides good results for binary classification on my “national” dataset. I now want a similar model for a “local” dataset which is almost the same but 5% of the images switch to the other binary class.

I tried training two “local” models. One used the “national” model weights as a base; and the other used random initialisation. I was surprised to find the random initialisation worked a lot better.

What would explain this?

jamesp · January 14, 2019, 12:07am

I wonder how many of your pre-trained weighs dropped to zero? If it’s a reasonably large number, that could explain it.

simoneva · January 16, 2019, 7:19pm

None dropped to zero.