Kaggle M5 comp with NN, validation strategy

louis · June 5, 2020, 11:05am

Hello,

I am working on this competition to see how convenient are the tabular learners. In the example on the glossman dataset it is pretty straightforward, but after digging a bit in the feature engineering I end up with blocker.

So I use rolling metrics over the previous x periods to make up for the temporal of the series. For example if I have the daily price of an item, I also process the mean, max, min, std, ewm of this price on the previous 28 and 7 days. It’s seems like a pretty standard FE, however it has some implications on the validation.

First, with this method you have to use a one-step prediction since you need to update your rolling features. So you would need to process your validation points one at a time. Second, not only you would have to do that, but you would also have to apply your FE transformation to compute your rolling features.

I assume that could be handled by the validation dataloader, and the transformers. I am not ultra familiar yet with fastai API and though the logics that back this problem are straightforward, it seems that it would take quite an effort of implementation to extend the lib.
As anyone faced the same situation before ? is there any initiative to extend fastai to deal with those cases ? I’d be happy to help.

For now I will just use samples 1 day ahead in my validation set but its hardly representative of the real hidden set used to score your model.