Tabular FillMissing, same for training and test set

eljas1 · August 1, 2019, 7:27am

I’m facing a problem where the test dataframe has some missing values in columns where the training dataframe didn’t. This results in FillMissing creating a different set of _na columns for the training and test set. Is there a convenient way to fix this? I suppose I could create a new row with only np.nan in all columns to force FillMissing to create a _na column for all variables but it’s a bit of a brute force method.

muellerzr · August 1, 2019, 10:06am

I’d recommend looking at what variables are missing and then write a new row in your train and validation that’s a copy of one row with those values missing, that way it’s passed over