Regarding how to choose the proper validation set for the grocery competition, Jeremy @jeremy mentioned yesterday that the best they tried so far is ‘same day range 1 month earlier’ or something similar, that gave a good linear fit between validation and testing set results.
I don’t understand how this works. My understanding is that we have 30-ish different datasets, each one is one day in a month and we make predictions respectively for each day? Is that correct?
I’m not sure what you mean by 30 different datasets. There’s only one test csv file made available on the Kaggle web site?..
I was trying to figure out how to get the ‘same day range 1 month earlier’ validation set that you mentioned for the grocery competition. I thought it meant we split the data into 1st, 2nd, 3rd… 30th day of the month as separated dataset and predict them separately. Is that what you mean for the validation set?
I simply meant selecting rows with date after 15th July and before 1st August.
oh… I overthinked. Thank you for clarifying!