Hi,
I’m new here and in the phase of getting to know fastai2 (great toy , thank you for all the hard work!), so apologies if this was already answered and I didn’t see it or the solution should be obvious. Also I hope this is the appropriate place here .
To the problem. I was playing with tabular data today when I noticed that the transformation pipeline of TabularPandas
does not seem to be happy about unexpected missing data, throwing this assertion error:
AssertionError: nan values in
cont_prop
but not in setup training set
where cont_prop
is just some random column name.
I’ve set up a notebook to reproduce the behavior here.
It seems this assertion error is thrown directly when TabularPandas
is first initialized over the training/validation data if the validation part contains unexpected missing values as well as when an existing instance of TabularPandas
is used over a test set with unexpected missing values.
A fix which seems to work for my toy data sets is to just add a row to the training set which contains a bunch of nans for the relevant columns.
But I’m wondering what to do if one cannot easily anticipate which columns may contain missing values? Just adding a row with every entry being nan seems like it would be inefficient. Is there possibly some best practice for dealing with this?
Thanks!