VGG model pre-computed features at different layers produce different results

When I pre-compute features of DogsVCats data for the convolution part of the model and then combine with the finetuned fully-connected (FC) model, I do not get the same training accuracy (98% v 99.97%) as when I train just the a Dense layer with pre-computed features from the all-but-last-layer version of the VggBN model.

  • I’ve checked that shuffle=False
  • I’ve copied weights for the FC model from the original and set the dropout=0.5
  • ensured that only the last Dense layer is trainable in the FC model

As far as I can tell, these should be equivalent models but the training accuracy is significantly different. Why is this?