I have two similar datasets to model. I have applied vgg16 with frozen imagenet weights; and trained a fully connected head. This provides good results for binary classification on my “national” dataset. I now want a similar model for a “local” dataset which is almost the same but 5% of the images switch to the other binary class.
I tried training two “local” models. One used the “national” model weights as a base; and the other used random initialisation. I was surprised to find the random initialisation worked a lot better.
What would explain this?