Sales (momentum) for future prediction? (Lesson3-4 Rossman)


#1

(Copying this from lesson 4 wiki thread, where it didn’t get a response. maybe better here as its own question.)

Is this statement correct? In Rossman, we’re not using past Sales as an input to help predict future Sales? Or are we indirectly in some way?

My understanding is the Sales data is removed from the data during training and only used as the prediction/result. I do see the causality issues of sales data in the training, I’m trying to understand the consequences of this. (Perhaps I need to complete the DL course and do the ML course, but I’m up to this lesson 4 in DL.) I’m generalising my thinking to any model where the output has a momentum-like component.

If there’s external factors affecting sales that are not captured in the data set, the results are visible in past sales. For example, lets say a competing chain has sales a few times a year and that affects Rossman sales. Maybe they last a week or two at a time. If they are consistent in calendar dates - our data set (with date encodings) will pick this effect up. If (hypothetically, my argument is perhaps moot here) they were inconsistent dates - then I’m picturing irregular periods where sales were lower than predicted - but there’d be a pattern to this. Similarly a neighbouring store could have sale dates that bring people in and increase rossman sales. Once we noticed unexpected sales levels, we could theoretically predict the next part of that continuing. My understanding is our model can’t deal with this

I hope thats clear.


(Jeremy Howard) #2

Sorry I saw that in the other thread but wasn’t sure what you were saying. Upon re-reading, I think I understand.

What you’re referring to is known as “auto-regression”. That is, using earlier time periods of the dependent variable as predictors. It is indeed common in time series, but not directly doable with the kind of model used in this notebook. For that, you’d need an RNN, which is introduced next! :slight_smile: (However, it seems that in practice RNNs aren’t giving as good results as fully connected nets for this kind of data.)


Rossmann with RNN
#3

thankyou. you understood correctly and now I know the word.
Effectively I was asking:

  1. if there was any auto-regression going on? (ans: no)
  2. What can we do with a problem has an auto-regression component with causes that are not picked up elsewhere in the data? (ans: If it solves better using this type of DL, keep using it. And keep studying)

(Casper Bojer) #4

Can you explain why we need an RNN - e.g. why we can’t just add an extra feature, which would be the lag X time series, before dropping the dependent variable?


#5

I’m going to pipe in with my best (beginners) guess… Jeremy is saying that CNNS work better when exactly that can be done on the input data.
RNNs (or LSTNs : Long Short Term Memory networks) can also do it within hidden layers