I mostly learn from my mistakes - and here it is no different. I joined the toxic competition a little bit over a week ago (which in itself is a dubious idea to join a competition that late - great for learning but otherwise can be stressful if you cannot distance yourself from how poorly you are doing! )
Based on my current understanding, here is a generic recipe for approaching a Kaggle competition:
- Make a first submission (published kernels are a great starting point!)
- Establish an easy way of training on a small sample
- Perform a full error analysis on model from #1 (very important, this is the step that I forwent this time and led me to flying blind!)
- Build a CV training pipeline, train the original model averaging test results and submit
- Put in place a single stacking pipeline, building on the first model only (only learning about it right now - maybe I am wrong on this one as I have not tested it but seems about right
)
- Throughout the process keep reading relevant papers and blog posts, learn from kernels and forum discussion <- this + error analysis is what should govern what changes to your original model you should make / what architecture to train next.
- Throughout the process, keep a google keep note with checkboxes for organizing ideas what to work on next <- super valuable! also ties in with something I have been thinking about for quite a while: “if you sit down in front of your computer having only a vague idea of what you should work on, you will waste time! you should have a plan before you sit down and stick to it for the duration of the session”
If you have any comments or suggestions, do chime in please!