I’m working with a data set combining two data frames before feeding it into the data block API. Becuase I’m combining the two, most columns will have a sizable number of NaN values. When I try to run this:
dep_var = 'y'
cat_vars = []
cont_vars = []
for i in list(both_raw.columns):
if both_raw[i].nunique() < 50:
cat_vars.append(i)
else:
if i != dep_var:
cont_vars.append(i)
procs = [Normalize, Categorify, FillMissing]
data = (TabularList.from_df(both_raw, cat_names=cat_vars, cont_names=cont_vars, procs=procs)
.random_split_by_pct(valid_pct=.2, seed=456)
.label_from_df(col=dep_var, label_cls=FloatList)
.databunch())
I get the following error:
KeyError: 'the label [#% increased Armour_na] is not in the [columns]'
This is a column that the procs is adding, so I’m not sure how resolve. Let me know what else would be helpful to post here.