Need help understanding error: the label [#% increased Armour_na] is not in the [columns]

knesgood · February 21, 2019, 7:28pm

I’m working with a data set combining two data frames before feeding it into the data block API. Becuase I’m combining the two, most columns will have a sizable number of NaN values. When I try to run this:

dep_var = 'y'
cat_vars = []
cont_vars = []

for i in list(both_raw.columns):
    if both_raw[i].nunique() < 50:
        cat_vars.append(i)
    else:
        if i != dep_var:
            cont_vars.append(i)
        
procs = [Normalize, Categorify, FillMissing]

data = (TabularList.from_df(both_raw, cat_names=cat_vars, cont_names=cont_vars, procs=procs)
                   .random_split_by_pct(valid_pct=.2, seed=456)
                   .label_from_df(col=dep_var, label_cls=FloatList)
                   .databunch())

I get the following error:

KeyError: 'the label [#% increased Armour_na] is not in the [columns]'

This is a column that the procs is adding, so I’m not sure how resolve. Let me know what else would be helpful to post here.

thelastprime · August 7, 2019, 9:18pm

I’m having the same issue. Did you ever find a fix?

knesgood · August 28, 2019, 2:46pm

Nope, but to be fair, I haven’t worked on that project in a while