Tabular MultiCategory (not one-hot encoded) fails

punkt2 · January 13, 2021, 8:56am

I’m failing to run the example code for the Tabular not one-hot encoded multi-label categories task from the fastai documentation page. Does anyone know why this is not working or has a working example?

Link to the section in the documentation:

Example code:

from fastai.tabular.all import *

def _mock_multi_label(df):
    targ = []
    for row in df.itertuples():
        labels = []
        if row.salary == '>=50k': labels.append('>50k')
        if row.sex == ' Male':   labels.append('male')
        if row.race == ' White': labels.append('white')
        targ.append(' '.join(labels))
    df['target'] = np.array(targ)
    return df

path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')
df_main,df_test = df.iloc[:10000].copy(),df.iloc[10000:].copy()
df_main = _mock_multi_label(df_main)

@MultiCategorize
def encodes(self, to:Tabular): 
    #to.transform(to.y_names, partial(_apply_cats, {n: self.vocab for n in to.y_names}, 0))
    return to
  
@MultiCategorize
def decodes(self, to:Tabular): 
    #to.transform(to.y_names, partial(_decode_cats, {n: self.vocab for n in to.y_names}))
    return to

cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
splits = RandomSplitter()(range_of(df_main))

to = TabularPandas(df_main, procs, cat_names, cont_names, y_names="target", y_block=MultiCategoryBlock(), splits=splits)
dls = to.dataloaders()
dls.show_batch()

Error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-90634fcc3c9e> in <module>()
----> 1 dls.show_batch()

13 frames
/usr/local/lib/python3.6/dist-packages/fastai/torch_core.py in tensor(x, *rest, **kwargs)
    127            else torch.tensor(x, **kwargs) if isinstance(x, (tuple,list))
    128            else _array2tensor(x) if isinstance(x, ndarray)
--> 129            else as_tensor(x.values, **kwargs) if isinstance(x, (pd.Series, pd.DataFrame))
    130            else as_tensor(x, **kwargs) if hasattr(x, '__array__') or is_iter(x)
    131            else _array2tensor(array(x), **kwargs))

TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

gautam_e · August 22, 2021, 1:21pm

I have the same problem. Did you find a solution?!