Demand Forecasting Training Set - Handling sparse demand?

I am trying to build a weekly demand forecast model using random forest regression. The model will be responsible for producing a forecast for ~3000-5000 different products at 200-300 stores. I am currently in the process of building the input features and came across several products where the demand is very sparse. For certain products, there are several weeks where we have 0 demand.

I have a few questions about handling these situations:

  1. Is it better to detect these specific cases and handle them with a different technique (rolling average)?
  2. In regards to the random forest model, is it valuable to include weeks with 0 demand for the model to learn?

Larry - depending upon the type of stores and location, having 0 demand could be fairly common occurrence, so yes, I would make this part of the model as this an inherent feature of retailer data. Large US and Canada store data could have a preponderance of slow moving items - particularly in the drug store channel. EU/UK are likely smaller footprint, with a faster turn rate on products. A few challenges with retailer data to watch out for

  1. promotions, if the product is promoted during the year - the 0,0,0,1,0 pattern could go to 0,0,0,10,5,2,0 quite easily.
  2. weather events can drive sales to 0 in certain regions (and spike them before the event). It is not uncommon to see spikes in the data just before a hurricane hits and then nothing for that localized set of stores (and similar for a snow storm), Both of these are problems seen in US stores, particularly along the mid Atlantic east and gulf coasts.

Hope that helps

1 Like

@Larry - I’m stuck with the same issue. Can you help me please? How can we connect? Thanks

@nabiil3 apologies for the late reply…

I opted to not use RF regression to solve this problem but instead use a neural net based model to forecast. The reason I made this decision is that although RF is great in terms of learning about feature importance, RF only learns what has happened in the past and is only to search for what has happened in the past (nearest neighbors).

In regards to the issue with variables with high cardinality (ex, thousands and thousands of unique SKUs), I discovered a recent method known as embeddings to reduce represent each SKU as a vector of N dimensions. Using embeddings has many advantages of its own. It seems like this idea is quite new and was used in the Rossman Kaggle competition.

I wonder if embeddings could be used in RF regressor as well. I don’t see why it couldn’t be. Maybe I will try and make a comparison to NN based model once I get time.

I believe in Jeremy, gave a lecture on this in a few of his different courses. Here is a link to one of them.

Hope this helps!

Larry