My kaggle submission shows a lower score

retr0 · August 14, 2019, 5:40am

I get this in google colab but once i submit for the kaggle competition it becomes this:

The competition is:https://www.kaggle.com/c/ieee-fraud-detection/overview
I really dont know what to do now
This is how i created the databunch :

blueharen · August 14, 2019, 12:42pm

Kaggle’s leaderboard and accuracy score is based off of data that you have not been training on. For most models it will differ to a certain extent. Since there is such a large disparity for you, I would assume that you are overfitting your model to your training data.

retr0 · August 14, 2019, 1:12pm

What should i do to solve this issue? How can i confirm if it is overfitting?

dhoa · August 14, 2019, 2:55pm

It is overfitting because you got good result on your data - 0.99 on your accuracy metric. The model did good on what you gave it. However the test set of Kaggle might be too different on the training set so you got bad result.

retr0 · August 14, 2019, 3:55pm

oh so the accuracy and auroc score is based on the training data and not the test data that was also given?

muellerzr · August 14, 2019, 5:44pm

Correct. Kaggle operates with 3 sets in total, a training, and two test sets. One goes to a private leaderboard, the other public, both of which we don’t have any access to (to make it fair.) So most likely while you overfit your model possibly on the training set they gave, it does not perform well on their test sets as their test sets are hand crafted to be difficult and proper to judge on. Does this help? Most likely you made your model right and all, just Kaggles are designed to be challenging and tough.

marii · August 15, 2019, 1:40am

Split_by_idx 800,1000? I might be crazy but that does not seem random. Are you sure 800-1000 is a good sample of your data? That would definitely throw off your results. Generally you want to split your training and validation data randomly over your non-test dataset.

Kiara693 · August 30, 2019, 1:01pm

It is overfitting because you got good result on your data - 0.99 on your accuracy metric. The model did good on what you gave it. However the test set of Kaggle might be too different on the training set so you got bad result. MyPrepaidBalance