VGG model pre-computed features at different layers produce different results

Salim · November 9, 2017, 4:39am

When I pre-compute features of DogsVCats data for the convolution part of the model and then combine with the finetuned fully-connected (FC) model, I do not get the same training accuracy (98% v 99.97%) as when I train just the a Dense layer with pre-computed features from the all-but-last-layer version of the VggBN model.

I’ve checked that shuffle=False
I’ve copied weights for the FC model from the original and set the dropout=0.5
ensured that only the last Dense layer is trainable in the FC model

As far as I can tell, these should be equivalent models but the training accuracy is significantly different. Why is this?