Regarding how to choose the proper validation set for the grocery competition, Jeremy @jeremy mentioned yesterday that the best they tried so far is ‘same day range 1 month earlier’ or something similar, that gave a good linear fit between validation and testing set results.
I don’t understand how this works. My understanding is that we have 30-ish different datasets, each one is one day in a month and we make predictions respectively for each day? Is that correct?
I was trying to figure out how to get the ‘same day range 1 month earlier’ validation set that you mentioned for the grocery competition. I thought it meant we split the data into 1st, 2nd, 3rd… 30th day of the month as separated dataset and predict them separately. Is that what you mean for the validation set?