I recently participated in DRW - Crypto Market Prediction | Kaggle. It was a bold move on my part, because I was just about to finish the Part 1 of the 2022 Fast.AI course but this was stuck in my head so I tried anyway. The competition ended yesterday.
In my initial attempts, I simply chugged the independent variables(5 named and 780 proprietary unnamed columns) and the dependent variable which was price. The dataset has ~500k column in each train and validation sets and the data is for each minute of the day
Here’s a list of different things I tried(No composite columns or normalization)
| Technique | Kaggle Score(Pearson Correlation Coefficient) |
|---|---|
| Fast.AI tabular using all columns and no normalization | 0.06948 |
| XGboost | 0.05142 |
| Average of XGboost and Fast.AI tabular prediction | 0.06504 |
| XGBoost Random Forests | -0.01921 |
vision_learner by converting the data to images using GADF and squishing into 192x192 using convnext_small_in22k (now named convnext_small.fb_in22k) and 1 epoch |
0.04929 |
Now there’s this notebook which seems to ensemble previous submissions by weights and its currently the highest scoring on the public leaderboard(~.95). There were some malpractices that happened ~2-3 weeks earlier so the host had invalidated all the submissions till date and had shared a new dataset but it looks like those malpractices(probably future peeking) were repeated on the new dataset to get such a high score.
Also, 260 people have >90% accuracy. I don’t know what to make of this. I was totally not hoping to have a super high score for myself since this is my first serious participation in a Kaggle competition with rewards but simply wanting to get some thoughts from folks here.
edit: There was a fair bit of discussion and even answers addressing them by the competition organiser. I totally missed it.