Why did a 1000-output layer only output 43 numbers? (see the story below)
Why was I able to load weights of one architecture into a network of a different architecture?
The main observation of the story:
I loaded weights corresponding to a trained 43-class Vgg16 model into a fresh 1000-class Vgg16 model. The predictions of the 1000-class Vgg16 model contained only 43 probabilities per test image. These probabilities matched the probabilities of the 43-class Vgg16 model.
The story:
I trained a Vgg16 model using the finetune procedure from lesson 1.
I saved the weights.
I made a fresh Vgg16 model, loaded the weights, and made predictions:
I made a fresh Vgg16 model again, this time finetuning it:
vgg2 = Vgg16()
# Equivalent to vgg2.finetune(batches)
vgg2.model.pop()
for layer in vgg2.model.layers: layer.trainable=False
vgg2.model.add(Dense(43, activation=‘softmax’))
vgg2.compile()
# end finetune
When you load the weigths with the last layer having only 43 classes, the weights for your last layer ends up being a numpy array which has only 43 elements.
myweights=vgg1.model.get_weights()
myweights[39].shape
and hence the model predictions would only output 43 classes
Maybe keras should have thrown at least a warning that the shape of the weights are different instead of just overwritting the weight arrays with one of different shape.
Nevertheless you are not doing any training after loading the weights, does a call to vgg1.fit throw any warnings?