I tried to use FillMissing from Tabular to deal with some missing categorical values, but noticed there were still missing values after applying the transformation. Upon closer inspection, it looks like FillMissing only works for continuous variables…
This was surprising to me because FillMissing takes a cat_names argument. Of the three fill strategies provided - median, common, and constant - median clearly wouldn’t work, but common and constant seem to make perfect sense.
Am I missing something about how this class works? Or would it be a bad idea for some reason to fill missing categorical variables this way?
Hi everyone,
My question is, why does FillMissing take ‘cat_vars’ (atleast in v1 of fastai…not sure about v2), if categorical variables are not replaced by anything, when missing? Is it to create the ‘_na’ column only?
Please see the code below Fill_missing = FillMissing(cat_vars, cont_vars)
The _na from FillMissing generate a categorical column that’s binary yes? So we need to add this new categorical variable to the cat_vars, so we need to append it. That’s why we need the cat_vars, even though the FillMissing itself is based upon the cont_vars