When I pre-compute features of DogsVCats data for the convolution part of the model and then combine with the finetuned fully-connected (FC) model, I do not get the same training accuracy (98% v 99.97%) as when I train just the a Dense layer with pre-computed features from the all-but-last-layer version of the VggBN model.
- I’ve checked that shuffle=False
- I’ve copied weights for the FC model from the original and set the dropout=0.5
- ensured that only the last Dense layer is trainable in the FC model
As far as I can tell, these should be equivalent models but the training accuracy is significantly different. Why is this?