Kaggle 'Invasive Species Monitoring'

Hi,

I am trying to apply lesson 1-3 VGG retraining approach to this Kaggle competition. Applying all methods like data augmentation, dropout, and batch normalization I achieve a training accuracy of ~98% and validation acc of ~99%. The competition metric is the area under curve (AUC). Now my question: If I calculate the AUC for my validation and training set, I achieve ~0.998 (would be top of leaderboard), but when uploading the test results to Kaggle, it only scores ~0.983 (top 30% of LB).

Any ideas on better estimating the Kaggle score and optimizing my model? I am using the roc_auc_score of sklearn.metrics.

Thanks,

Christian

2 Likes

i’ve noticed similar patterns in my training (#61 on lb here). ill get 99.9 on validation sets or 99.8 or 99.5 and then my actual score is lower on leaderboard. it probably has to do with the fact that something i’m doing is :

1 - fitting the validation data a bit too much somehow
2 - my validation set is not representative of the test set.

hope this helps a bit.

good luck.