TabularPandas.classes always includes #na#

invariant · June 25, 2021, 10:31am

I have constructed a TabularPandas object to from a DataFrame without missing values. I can check from to.xs that the underlying values indeed do not contain any NaN’s. When I use to.classes, however, #na# is included in every list of discrete levels for all the categorical columns.

Is this common behaviour?

muellerzr · June 25, 2021, 10:46am

It is! This is a holder in case any categorical variables aren’t present in your training (or validation and test) datasets

FillMissing is only for continuous variables. Categorify is what’s causing this

invariant · June 25, 2021, 11:43am

Got it, thanks!