Trouble with training final_model (combined bn_model and conv_model) on StateFarm dataset
I am having trouble with the training of the final_model (which combines the Vgg convolution layers (fixed at original weights)) with the new fully connected layers (including Dropout(p=0.6) and Batchnorm layers).
My bn_model trains quickly to a validation accuracy of ~80% (training accuracy of ~99%). However, when I load these bn_model weights into the final_model in order to train it further, the final_model performance worsens during training. In the first epoch training accuracy drops to ~10% with validation accuracy dropping to 53% in first epoch down to ~20% in the fourth epoch (training accuracy continuing to hover around 10%).
These are the steps I followed:
1. Created a 'static' dataset (using get_batches with shuffle = false)
2. Split off the convolution layers from the Vgg16 model and compile with the RMSprop optimizer with lr=.0001. Set the layers to .trainable=False.
3. Constructed the bn_model (with the layers as per the Lesson 3 notebook i.e. 3 dense layers, 2 dropout layers, 2 batchnorm layers) and compiled with RMSprop optimizer with lr=.0001.
4. Generated the feature inputs (training and validation) for the bn_model with conv_model.predict.
5. Trained the bn_model. Within one epoch the training accuracy went to 88.6% with the validation accuracy at 79%.
6. Added the bn layers to the conv_model to create the final_model. Loaded the bn_weights for the bn layers of the final_model. Compiled with RMSprop optimizer with lr = .0001.
7. Run the final_model with the train_batches and val_batches created during step 1. The model accuracy starts out at that of the bn_model, but then quickly drops off to a training accuracy of ~10% in the first epoch.
I have done the following steps to try to isolate the problem:
1. I used .evaluate() on both bn_model and final_model with the data used for training and confirmed that they get the same results. This hopefully rules out any discrepancy between the feature data created with the conv_model for the bn_model and the 'raw' image data used for the final_model training.
2. I compared the weights of the convolution layers of the final_model and conv_model before and after training to confirm the that weights were originally properly loaded and that the .trainable=False ensured that these layer weights are not adjusted form the original Vgg16 weights.
3. I did the same comparison of the fully connected layers of the final_model with the layers of the bn_model to confirm that the weights were the same prior to training the final_model.
4. I also did a further round of training on the bn_model to ensure that it did not show the same behaviour as the final_model. With further training the bn_model continued to improve the training accuracy (with the validation accuracy oscillating between ~60% and 80%.
I'm having no luck uploading screenshots from my model, but hopefully the description above is clear enough. I will again try later.
Any ideas of what might be causing this to happen?
Any pointers of further trouble shooting I can do?