I tried to use FillMissing from Tabular to deal with some missing categorical values, but noticed there were still missing values after applying the transformation. Upon closer inspection, it looks like FillMissing only works for continuous variables…
This was surprising to me because FillMissing takes a cat_names argument. Of the three fill strategies provided - median, common, and constant - median clearly wouldn’t work, but common and constant seem to make perfect sense.
Am I missing something about how this class works? Or would it be a bad idea for some reason to fill missing categorical variables this way?
For categorical variables, missing value is a category of its own, so there is no need to fill it with anything.
My question is, why does FillMissing take ‘cat_vars’ (atleast in v1 of fastai…not sure about v2), if categorical variables are not replaced by anything, when missing? Is it to create the ‘_na’ column only?
Please see the code below
Fill_missing = FillMissing(cat_vars, cont_vars)
_na are appended to
cat_vars based on the
I’m sorry, but what do you mean by the part ‘based on the cont_vars’?
FillMissing generate a categorical column that’s binary yes? So we need to add this new categorical variable to the
cat_vars, so we need to append it. That’s why we need the
cat_vars, even though the
FillMissing itself is based upon the