[Rossman] Forecasting using the future google trend and weather data

runze · December 3, 2017, 8:54pm

Hi,

I’m reading the Rossman forecasting notebook from Lesson 3 and I wonder if it’s “fair” (or realistic) for us to include the actual google trend and weather data in our training and testing. Consider a real-world scenario where we need to forecast sales in the future and, at the moment of forecasting, we don’t have the future google trend or weather data, and if we really want those features, we’ll need to do another forecast for them. In the notebook (if I read it correctly), it looks like we just joined our test data directly with the actual google trend and weather as follows:

joined_test = join_df(joined_test, googletrend, ["State","Year", "Week"])
joined_test = join_df(joined_test, weather, ["State","Date"])

What do you think?

miguel_perez · December 4, 2017, 10:49am

@runze, when you have this kind of data and you think of “real” applications of your model your options are discard it or try to model it independently. In your example, in real life you frequently have the weather forecast for next, say 3 days with quite good confidence…

And, more in general, this kind of data could also help you to better understand patterns when doing EDA.

All that said, Kaggle comps are good proxy for “realistic modelling”,but depends on the rules, sometimes they are not