Area under ROC value gets worse on test set?

I am currently working on the following dataset: IEEE-CIS Fraud detection. According to the competition, we have to use the area under the ROC curve as our evaluation metrics. So for this purpose, I am using sklearn’s metric: roc_auc_score. The value of this score is pretty good for the validation and training set. But when I submit my prediction on the test set my score drops to 0.5 which is very far from what I got on validation set i.e. 0.92

Can anyone suggest to me how can I tackle such overfitting? if this is overfitting at all?

Please find below link to my kaggle kernel for better understanding:
https://www.kaggle.com/keyurparalkar/ieee-cis-fraud-detection-with-fastai

1 Like