I am attempting to do some tabular autoencoding in Fast.ai using the ADULT_SAMPLE data. I am wanting my Y values to be the same as my X values when the model is finished, so I have the following:
dep_var = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'age', 'fnlwgt', 'education-num']
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [FillMissing, Categorify, Normalize]
test = TabularList.from_df(df.iloc[800:1000].copy(), path=path, cat_names=cat_names, cont_names=cont_names)
data = (TabularList.from_df(df, path=path, cat_names=cat_names, cont_names=cont_names, procs=procs)
.split_by_idx(list(range(800,1000)))
.label_from_df(dep_var)
.add_test(test)
.databunch())
However the creation of the databunch gives me:
AssertionError: You have NaN values in column(s) ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'age', 'fnlwgt', 'education-num'] of your dataframe, please fix it.
My question is how do I take the procs that were generated to fill the missing values in the databunch and then set my label_from to those generated values?
Thanks,
Zach