95% Accuracy notebooks in the recent DRW Crypto Market Prediction Kaggle Comptetition

I recently participated in DRW - Crypto Market Prediction | Kaggle. It was a bold move on my part, because I was just about to finish the Part 1 of the 2022 Fast.AI course but this was stuck in my head so I tried anyway. The competition ended yesterday.

In my initial attempts, I simply chugged the independent variables(5 named and 780 proprietary unnamed columns) and the dependent variable which was price. The dataset has ~500k column in each train and validation sets and the data is for each minute of the day

Here’s a list of different things I tried(No composite columns or normalization)

Technique Kaggle Score(Pearson Correlation Coefficient)
Fast.AI tabular using all columns and no normalization 0.06948
XGboost 0.05142
Average of XGboost and Fast.AI tabular prediction 0.06504
XGBoost Random Forests -0.01921
vision_learner by converting the data to images using GADF and squishing into 192x192 using convnext_small_in22k (now named convnext_small.fb_in22k) and 1 epoch 0.04929

Now there’s this notebook which seems to ensemble previous submissions by weights and its currently the highest scoring on the public leaderboard(~.95). There were some malpractices that happened ~2-3 weeks earlier so the host had invalidated all the submissions till date and had shared a new dataset but it looks like those malpractices(probably future peeking) were repeated on the new dataset to get such a high score.

Also, 260 people have >90% accuracy. I don’t know what to make of this. I was totally not hoping to have a super high score for myself since this is my first serious participation in a Kaggle competition with rewards but simply wanting to get some thoughts from folks here.

edit: There was a fair bit of discussion and even answers addressing them by the competition organiser. I totally missed it.

I observe that there are lots of people scoring “0 loss” on the titanic and house prices dataset.
Also, I checked the leaderboard of your competition and now the max score is ~0.16. Are the >90% accuracies deleted?

1 Like

From the most recent discussion from the Organisers:

So it looks like yes.

1 Like