Why doesn't FillMissing apply to categorical variables?

I tried to use FillMissing from Tabular to deal with some missing categorical values, but noticed there were still missing values after applying the transformation. Upon closer inspection, it looks like FillMissing only works for continuous variables…

This was surprising to me because FillMissing takes a cat_names argument. Of the three fill strategies provided - median, common, and constant - median clearly wouldn’t work, but common and constant seem to make perfect sense.

Am I missing something about how this class works? Or would it be a bad idea for some reason to fill missing categorical variables this way?

For categorical variables, missing value is a category of its own, so there is no need to fill it with anything.

2 Likes

Hi everyone,
My question is, why does FillMissing take ‘cat_vars’ (atleast in v1 of fastai…not sure about v2), if categorical variables are not replaced by anything, when missing? Is it to create the ‘_na’ column only?
Please see the code below
Fill_missing = FillMissing(cat_vars, cont_vars)

Yes, the _na are appended to cat_vars based on the cont_vars

I’m sorry, but what do you mean by the part ‘based on the cont_vars’?

The _na from FillMissing generate a categorical column that’s binary yes? So we need to add this new categorical variable to the cat_vars, so we need to append it. That’s why we need the cat_vars, even though the FillMissing itself is based upon the cont_vars

2 Likes