I am participating in House Prices - Advanced Regression Techniques competition. By following the instructions in Chapter 9 of the book, I created Tabular pandas block like that:
cont, cat=cont_cat_split(df, 1, dep_var=dep_var)
procs=[Categorify, FillMissing, Normalize]
splits=RandomSplitter()(range_of(df))
to=TabularPandas(df, procs, cat, cont, y_names=dep_var, y_block=RegressionBlock(), splits=splits)
Then created DecisionTreeClassifier with minimum 25 leaves
m = DecisionTreeRegressor(min_samples_leaf=25)
m.fit(xs, y)
But, when trying to predict from test set by calling these:
to_test=TabularPandas(tst_df, procs=procs, cat_names=cat, cont_names=cont)
tst_xs = to_test.train.xs
def subm(preds, suff):
tst_df['SalePrice'] = preds
sub_df = tst_df[['Id','SalePrice']]
sub_df.to_csv(f'sub-{suff}.csv', index=False)
subm(m.predict(tst_xs), 'dt')
I got an error:
Feature names unseen at fit time:
- BsmtFinSF1_na
- BsmtFinSF2_na
- BsmtFullBath_na
- BsmtHalfBath_na
- BsmtUnfSF_na
- ...
Feature names must be in the same order as they were in fit.
X has 91 features, but DecisionTreeRegressor is expecting 83 features as input.
My notebook is visible to anyone, can you help me with that?