I’m trying to train a model using tabular data and running into a couple of issues and I’m not sure of the root cause. I’ve simplified my dataset to only use three columns:
Label, Histogram and General
Label is the dependent variable, Histogram is an array of 256 floats and General is an array of 10 floats.
data = (TabularList.from_df(data, cat_names=[], cont_names=cont_vars, procs=procs)
.split_by_rand_pct(0.2, seed=42)
.label_from_df(‘label’)
)
I get the following error before I get to create a databunch:
“AssertionError: Cannot normalize ‘histogram’ column as it isn’t numerical.
Are you sure it doesn’t belong in the categorical set of columns?”
Is this because it’s an array? If so what should be the best way to handle this feature array, does every number require its own column?
Thank you for the response! Do you know if this was a design decision with the fast ai library ? For instance, would a library like tensorflow be able handle an array as a column value ? This just could be my misunderstanding as to how the libraries handle different data types.
You could modify your own custom item list to take it in how you’d hope to achieve it. The library is very versatile, look at the ItemList tutorial in the docs docs.fast.ai
Would you suggest creating a custom item list that flattens this array into one number ? Or something that is able to handle the array of 256 elements? Thanks again for clarification!