Corporación Favorita Grocery Sales Forecasting

Jonas · October 1, 2018, 3:58pm

In case anyone is still interested in this competition (I am), I have published a clean notebook with my own work so far: (https://github.com/jonas-pettersson/fast-ai/blob/master/Exploration%20and%20Prediction%20for%20Structured%20Data.ipynb)
I am still not anywhere near a good result (my best score was 0.614), but I think the notebook can be of help to a newcomer. I am of course also very grateful for any feedback.
You can read my conclusions at the end of the notebook, but here the short version: it is not sufficient to throw this problem at a deep neural network and hope for the best. I started this exercise by not looking at any forums or kernels, just to see how far I would get on my own based on the Rossman example from the DL course and all I learned from the ML course.
Not very far it turned out. First I had to repair this thing about zero sales missing in training data. And after that, only when I added the feature “moving average”, as suggested by @kevindewalt, things started go in the right direction.
Anyway, I learned a lot. Not only about the practical use of the fast.ai library but also in not giving up in face of frustrating setbacks. Kaggle competitions are a great way to learn because you get feedback via your scores and you can learn from others.
If I get some more time I will continue down the path of going through kernels and trying to find out what I can improve, probably adding more (“engineered”) features. I might also come back once I have understood LSTMs more thoroughly, as many seem to use that. Even if the data set feels very hard, I think it is a good learning example as it is close to reality with all problems that come with it.
Or I might look for some “living” competition with structured data instead…