I’m building a model that uses structured time series data to forecast one month into the future. For the sake of this example, let’s suppose I have data from 1/1/2000 to 12/31/2017. Because the real world use of the model is to forecast one month into the future (7/1/2018 for example) and then retrain the model on the updated data before the next forecast I’m trying to mirror that as best I can in the validation set. I’m wondering how to best represent this in my validation set. What I would like to do is save 2017 for the test set and use 2016 as the validation set. The process would be to start out with the model trained up until the end of 2015 and forecast the data for January of 2016. Then add January actuals to the dataset and retrain and then predict February. So after stepping through all of 2016 in this fashion, I would evaluate the loss of the prior 12 steps.
Is this a valid approach or is it likely I’ll get misleading results?
One problem I see is the constant retraining means the model is dynamic so I’m not comparing against a constant.
Any suggestions or tips ?