I am confused about the following piece of code in lesson 2:
df_trn, y_trn, nas = proc_df(df_raw, ‘SalePrice’, subset=30000,
X_train, _ = split_vals(df_trn, 20000)
y_train, _ = split_vals(y_trn, 20000)
In the above piece of code, to speed things up, we are randomly selecting a subset of 30000. In the earlier part of the lecture, the Professor states that, when a data set has a timeseries element in it, we need to make sure that the traning, validation and test sets have different time periods.
So my question is, how can we maintain different time ranges for each of the training, validation and test sets when the data is sampled randomly?