Structured data and only categorical variables


(Philipp Schwarz) #1

Dear all,

I just started off using the fastai library, so apologize for asking a possibly incredible stupid question.

I am using the mercari dataset from kaggle and attempting to predict price based on solely categorical variables - I want to use: cat_vars = [‘item_condition_id’,‘category_name’, ‘brand_name’, ‘shipping’]. I am starting off with the Notebook from Jeremy of Rossmann dataset and adapting it as required.
I get stuck at the following code snippet:

df, y, nas, mapper = proc_df(train_samp,
                             y_fld = 'price',
                             do_scale=True)

It fails with the error:
ValueError: need at least one array to concatenate

It seems that the function expects at least one existing input vector of a continuous variable that we can concat the embedding matrix to?
Am I overlooking something? Should I make a throwaway helper or can we change the function?

I greatly appreciate your help.