Kaggle Rossman prediction was 1 timeseries output to predict per time period and it had many independent variables to help predictions including past values of 1 timeseries itself. Jeremy’s model worked very well for that. It required very little domain knowledge, which is super good news.
Question: Can a very similar model to Jeremy’s Rossman also be used, when data has 2+ timeseries outputs and inputs many being interrelated somewhat? I expect a Rossman variant can be used with minimal changes here. The reason is that LSTMs can do multi input and multi output. The new model is a quite small design change from Rossman, potentially. How many time series can it handle though, and still be able to train in less than one day on a single workstation computer?
Rossman predicts just a single sales level every time period. Consider that in retail sales it is also very useful to predict every product’s timeseries of which there is frequently 1 million products in a single grocery store and many of these move up or down together, that is, frequent associations are here in droves as in “market basket”. Like if you buy milk you might also buy bread and eggs, but if you buy icecream you might buy toppings like walnuts and caramel more likely. It’s impossible to cominbinatorially try every pair, and this is why deep learning really is most likely a better solution than a plain linear model that would explode in the number of possible interaction terms to model and evaluate.
As a first attempt I would keep it simple and use same basic design as Jeremy’s Rossman architecture, and try solving with only say, 10 or fewer timeseries to predict using the same dataset (multiple predictions). It would be nice to find an open dataset having something like 3-10 timeseries inputs, and 3-10 timeserious outputs to predict, plus availability of a couple of other typical independent variables like weather and day of week and holidays.
It turns out that there is really not much literature on this published by anyone, academia or industrial, using modern designs, except mostly just the old ARIMA stuff. Amazon Research in 2017 is one of the few papers I found anywhere (maybe 3 total papers in all) that uses a deep neural network and recognizes possible product associations (frequent associations). Amazon’s paper looks to be on the right track, but is super complicated compared to Jeremy’s Rossman model, and for me the paper is either too hard to read or is leaving out a lot of information on purpose for competitive reasons (ie, walmart!).
I’m looking for an open dataset that meets these needs to do an experiment. Let me know if interested!