First, I want to say what a wonderful resource the fastai package, videos & course are. Thank you!
I am trying to apply the lessons, to test what I have learned. I thought a current Kaggle tabular dataset would be a good place to start.
The “predict future sales” task link is basically a variant of the Rossmann example in the course.
In the Rossman example, the number of columns in the test dataset is 1 less than the train dataset. However, in “predict future sales” the number of columns / features in the test dataset is limited. Just shop_id and item_id.
The gap in my knowledge is: how to predict using a model trained on many features for a test / holdout dataset with fewer features?
At the moment I get the error: “None of [Index([‘item_price’], dtype=‘object’)] are in the [columns]”
Here is the head() of each data frame:
Train:
|date|date_block_num|shop_id|item_id|item_price|item_cnt_day|
Test:
|shop_id|item_id|
Here is my notebook GitHub link
The error implies I should add all the columns contained in the training dataset to the test dataset, but then what should the values be, NaN / 0 at each entry?
Cheers,
Ben