Kaggle Code Competitions

evw · February 2, 2023, 11:29pm

When submitting a Kaggle Code Competition (ex: from Lesson 4 - US Patents), the notebook that Kaggle runs during submission includes the full training process. Is that standard practice or are folks zipping their trained model → downloading → uploading to a new notebook dedicated to inference to run faster?

Rohan_Lalwani · February 3, 2023, 1:13am

It’s not necessary to submit a notebook or model to Kaggle. You need to submit the predictions your model made on test data in the form of a CSV file. See the sample_submission.csv file in the data section of the competition:

evw · February 3, 2023, 2:44am

I think in this case a notebook must be submitted based on my understanding of the requirements overview:

The end of the sample notebook for lesson 4 also mentions this:

When I did a submission, Kaggle spins up a dedicated instance to run the notebook. It took ~5 minutes to complete and get a score. I assume that’s because it’s going through the full training process in the notebook, which seems wasteful to do on every submission.

I’m curious if there’s a better way to skip training in the submission context, or do folks not bother and just let it run? It seems like it would be a big issue with larger models.

Rohan_Lalwani · February 3, 2023, 7:24am

It seems like you are right, I didn’t know that this competition requires a notebook too. But as it turns out, it does re-run the whole notebook and then uses the submission file you select.

VDM · February 3, 2023, 4:30pm

Some competitions have hidden test files to avoid probing. So you submit code - including training or not depending on rules-, and “the submission file you select” is not your prediction done in advance, but the name of the file with predictions that your notebook will generate when re-run. If not forbidden, you may attach your trained model to the notebook as data source so you avoid training on the fly.