Teaming Up for Kaggle NLP Competitions

sambit · April 20, 2022, 11:01am

Hi All,

I’m planning to attack a few Kaggle NLP competitions soon. Is anyone interested in attacking them together?

Prerequisites: 1. Completion of any of the fastai courses, 2. Decent grasp of raw PyTorch, 3. Familiarity with HF Transformers (e.g., the material covered in the official Hugging Face Course). 4. Minimum time commitment of 10 hours a week. If you’re busy / don’t have time, then this won’t work out.

If you meet the above prerequisites, then I propose a “cooperate to dominate” strategy for each new NLP competition:

We’ll start off with a Zoom call to discuss the problem statement & do some basic EDA.
Jointly discuss a watertight CV strategy, architecture options, training procedure ideas, etc.
Divide the work of reviewing the best Kaggle notebooks each week, extract the best ideas from each, and combine the best strategies to create the best possible model.
Divide up the experimentation - for example, one fold / one seed by each person.
Divide the responsibility of reading & summarising relevant research papers & notebooks from past competitions.
Create a central knowledge repository for each live competition (of what worked / what didn’t etc).
Use an experiment tracking tool (e.g., Weights & Biases / Neptune.ai) to track all the experiments by all the team members.
Ensemble the best models to get the best possible rank.

Look forward to your responses!

P.S. The Kaggle team member limit is 5. If more than 5 people are interested, then we can create multiple teams.

miwojc · April 20, 2022, 7:56pm

some starter notebooks for you from someone who is well known on this forum

sambit · April 21, 2022, 3:52am

Thanks for sharing!

miwojc · April 21, 2022, 2:31pm

there’s also the great blurrified notebook by Wayde:

and i have slightly modified Jeremy’s notebook to make it run offline so that people can submit to competition:

sambit · April 22, 2022, 5:25am

Very cool. Thanks again for sharing!