Hi there.
I don’t know why FillMissing adds the column_na as categorical variables, but I’m trying to add them as continuous variables instead (0 and 1. I don’t need an embedding of size 3 for them…).
So I just changed the FillMissing.encodes to be
def encodes(self, to): missing = pd.isnull(to.conts) for n in missing.any()[missing.any()].keys(): assert n in self.na_dict, f"nan values in `{n}` but not in setup training set" for n in self.na_dict.keys(): to[n].fillna(self.na_dict[n], inplace=True) if self.add_col: to.loc[:,n+'_na'] = missing[n] if n+'_na' not in to.cont_names: to.cont_names.append(n+'_na')
(so I just changed cat_names to cont_names in the last line).
This works nicely when training, but when predicting, as in
learn.dls.test_dl(test_df)
it complains about
KeyError: ‘column_na’ not in index
so I’m guessing there is some part of the fastai code that when a column with na is missing from the test dataframe, it adds it, but I can’t find it.