Problem with cont_cat_split() in Google Collab

sylvaint · January 11, 2021, 12:37am

If you just want to run the notebook for now:

Change saleWeek from UInt32 to uint32
df = df.astype({'saleWeek' : 'uint32'})

Modify cont_cat_split to detect ProductSize which is a CategoricalDtype properly by using the name property:

def cont_cat_split(df, max_card=20, dep_var=None):
    cont_names, cat_names = [], []
    for label in df:
        if label in L(dep_var): continue
        
        # mod to detect ProductSize type properly
        if (df[label].dtype.name == 'category'):
          cat_names.append(label)
          continue

        if (np.issubdtype(df[label].dtype, np.integer) and
            df[label].unique().shape[0] > max_card or
            np.issubdtype(df[label].dtype, np.floating)):
            cont_names.append(label)
        else: cat_names.append(label)
    return cont_names, cat_names