Kaggle prediction on private test dataset

I wonder how kaggle infer predictions on private test data set just from our submitted predictions on public test set. Do they reverse engineer the model in some way?

Let’s say you are given 100 rows in test set to predict. On the leaderboard page in kaggle, it’s always mentioned the ratio of data used in public and private leaderboards.

If the ratio says 30%-70% public-private, whenever you submit while the contest is on, the public lb score you’re shown is always calculated only on 30 records of test dataset. After the competition ends, the score you’ll be finally ranked on is obtained by the score on the other 70 records of the data. Quite simple really! :slight_smile:

2 Likes

Thanks @binga for your inputs. Its quite clear now.