Thank you - this is good to know.
I appreciate you sharing this especially as I know how much time this competition can require and how much time you probably had to invest to get to this point.
My big take away here is that one has to take a very organized approach to processing the data and to constructing models. I think I knew this intellectually, but it is a completely different ball game when you jump into actually working on a problem.
I started working on this competition reproducing the Rossman notebook from Jeremy (which BTW I think is an absolutely fabulous learning resource). Being relatively new to pandas I struggled a bit with the API and also invested quite a bit of time into working with relatively big amounts of data with comparatively little hardware resources.
The crux of the matter is that after many, many lines of code, since I didn’t work on this incrementally, I have no clue now why my model is not working as well as I would expect that it should (given the data I feed it I would expect even a simple linear regression to outperform the LB score of 0.546 achieved via taking a mean of medians…).
I tried verifying my data processing pipeline, but that is a lost cause at this point - too much code, processing data on even a beefy AWS instance takes time, impossible now to bolt on checks, hence I am working on this from scratch. Will start with a very simple mean that gets 0.558 on LB and will be incrementally adding things to this, both adding features to the data and working on more complex models. Small, incremental steps - hopefully this will make navigating the complexity of this possible.
Only unfortunate bit is how little time is left in this competition as I would like to give it a proper go, but I probably gave it my best. There will be other challenges to work on and I feel learning the process of doing machine learning is very valuable in itself. Hopefully I’ll be able to put it to good use at some point