(Making a good kaggle test set is kind of a different beast and I will ignore that here to focus on the general case.) Dealing with timeseries is always tricky, but the out of bag score should be fine, depending on how you create the training set to train your model. Make sure to sort by date then maybe grab 20% for validation and train your model so that most recent dates get more weight than observations from earlier times. That’s an argument to the fit()
method. If the most recent data is similar to the data beyond your 80% cut off, the out of bag score should be reasonable.
it’s useful to use the out of bag because it’s much faster than doing cross validation testing; and comes for free with the fit.
All that said, you are right. Extrapolation with random forests is not good. They’re going to predict that the future looks exactly like the most recent data in the training set. If the validation set is much different, you would in fact see out of bag score not matching validation score.
One can consider adding a feature to a random forest model that gives it a time sensitive hook or you can go to a generalized linear model etc…