I am trying to apply lesson 1-3 VGG retraining approach to this Kaggle competition. Applying all methods like data augmentation, dropout, and batch normalization I achieve a training accuracy of ~98% and validation acc of ~99%. The competition metric is the area under curve (AUC). Now my question: If I calculate the AUC for my validation and training set, I achieve ~0.998 (would be top of leaderboard), but when uploading the test results to Kaggle, it only scores ~0.983 (top 30% of LB).
Any ideas on better estimating the Kaggle score and optimizing my model? I am using the roc_auc_score of sklearn.metrics.