Why does random initialisation work better than using weights from a similar model

I have two similar datasets to model. I have applied vgg16 with frozen imagenet weights; and trained a fully connected head. This provides good results for binary classification on my “national” dataset. I now want a similar model for a “local” dataset which is almost the same but 5% of the images switch to the other binary class.

I tried training two “local” models. One used the “national” model weights as a base; and the other used random initialisation. I was surprised to find the random initialisation worked a lot better.

What would explain this?

I wonder how many of your pre-trained weighs dropped to zero? If it’s a reasonably large number, that could explain it.

None dropped to zero.