Tabular Issue - TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed

Hi All,
Trying to run the TabularDataBunch.from_df command to generate a DB for the tabular_learner.
Upon running the code:

tfms = [FillMissing, Categorify]
train_df, valid_df = train_test_split(df, test_size=0.33)
dataB = TabularDataBunch.from_df(path, df, train_df, valid_df, dep_var, dl_tfms=tfms, cat_names=cat_names)

I get this peculiar error:

Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py”, line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File “”, line 1, in
dataB = TabularDataBunch.from_df(path, df, train_df, valid_df, dep_var, dl_tfms=tfms, cat_names=cat_names)
File “/home/…local/lib/python3.6/site-packages/fastai/tabular/data.py”, line 94, in from_df
cont_names = ifnone(cont_names, list(set(df)-set(cat_names)-{dep_var}))
File “/home/…/.local/lib/python3.6/site-packages/pandas/core/generic.py”, line 1886, in hash
" hashed".format(self.class.name)
TypeError: ‘DataFrame’ objects are mutable, thus they cannot be hashed

Anyone ever skated around this issue?

I thought perhaps it had to do something with the pre-processing arguments at first when I was using procs = procs. However, when you use procs = procs I got a whole other error:

Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py”, line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File “”, line 1, in
dataB = TabularDataBunch.from_df(path, df, train_df, valid_df, dep_var, procs=procs, cat_names=cat_names)
TypeError: from_df() got multiple values for argument ‘procs’

Hi.
As for your first issue, looks like you’ve mispositioned the arguments. As manual shows us the 3rd argument in from_df is dep_var and you put train_df there

data = TabularDataBunch.from_df(path, df, dep_var, valid_idx=valid_idx, procs=procs, cat_names=cat_names)

After bumping into such types of error (as well as some changes in function API) a lot of times I’ve started to use named over positional parameters as often as possible :slight_smile:

As for the second issue (apart from error above) – can you please show us how your variable procs is initialised?

1 Like

Hi Pavel,
Thanks so much for taking a look at my issue. I was following a tutorial online that used a different form of the fast.ai tabular arguments. None the less, I changed the order of the args and still get an error.

procs = [FillMissing, Categorify, Normalize]

valid_idx = range(len(df)-2000, len(df)

dataB = TabularDataBunch.from_df(path,
df,
dep_var,
valid_idx=valid_idx,
procs=procs,
cat_names=cat_names)

This is then followed by the error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-28-c12ef293045d>", line 6, in <module>
    cat_names=cat_names)
  File "/home/*/.local/lib/python3.6/site-packages/fastai/tabular/data.py", line 98, in from_df
    src = src.label_from_df(cols=dep_var) if classes is None else src.label_from_df(cols=dep_var, classes=classes)
  File "/home/*/.local/lib/python3.6/site-packages/fastai/data_block.py", line 472, in _inner
    self.train = ft(*args, from_item_lists=True, **kwargs)
  File "/home/*/.local/lib/python3.6/site-packages/fastai/data_block.py", line 283, in label_from_df
    return self._label_from_list(_maybe_squeeze(labels), label_cls=label_cls, **kwargs)
  File "/home/*/.local/lib/python3.6/site-packages/fastai/data_block.py", line 271, in _label_from_list
    label_cls = self.get_label_cls(labels, label_cls=label_cls, **kwargs)
  File "/home/*/.local/lib/python3.6/site-packages/fastai/data_block.py", line 260, in get_label_cls
    it = index_row(labels,0)
  File "/home/*/.local/lib/python3.6/site-packages/fastai/core.py", line 276, in index_row
    return a[idxs]
IndexError: index 0 is out of bounds for axis 0 with size 0

Seems that regardless of my form for TabularDataBunch, there seems to be an issue with the index.

My dataframe is fairly large was far as features go.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1604 entries, 0 to 1603
Columns: 325 entries, Sex to HINT_1YR
dtypes: float64(299), int64(6), object(20)
memory usage: 4.0+ MB

I guess the question is, do I have to somehow ‘fix’ the index of the pandas dataframe?

The target variable is a float64. The features are a mix of objects, int64 and floats.

So, I’ve hit my head some more on this.

Seems I can get it to work using this command:

# Test creation
test = TabularList.from_df(df.iloc[800:1000].copy(), path=path, cat_names=cat_names)

# Create Databunch
data = (TabularList.from_df(df,
path=path,
cat_names=cat_names,
procs=procs)
.split_by_idx(list(range(800,1000)))
.label_from_df(cols=dep_var)
.add_test(test)
.databunch())

But, still, I run into this error when I use this command:

valid_idx = range(len(df)-2000, len(df))
data = TabularDataBunch.from_df(path,
df,
dep_var,
valid_idx=valid_idx,
procs=procs,
cat_names=cat_names)

Subsequent error:

Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py”, line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File “”, line 1, in
data = TabularDataBunch.from_df(path, df, dep_var, valid_idx=valid_idx, procs=procs, cat_names=cat_names)
File “/home//.local/lib/python3.6/site-packages/fastai/tabular/data.py", line 98, in from_df
src = src.label_from_df(cols=dep_var) if classes is None else src.label_from_df(cols=dep_var, classes=classes)
File "/home/
/.local/lib/python3.6/site-packages/fastai/data_block.py”, line 472, in _inner
self.train = ft(args, from_item_lists=True, **kwargs)
File "/home/
/.local/lib/python3.6/site-packages/fastai/data_block.py", line 283, in label_from_df
return self._label_from_list(_maybe_squeeze(labels), label_cls=label_cls, **kwargs)
File “/home//.local/lib/python3.6/site-packages/fastai/data_block.py", line 271, in _label_from_list
label_cls = self.get_label_cls(labels, label_cls=label_cls, **kwargs)
File "/home/
/.local/lib/python3.6/site-packages/fastai/data_block.py”, line 260, in get_label_cls
it = index_row(labels,0)
File “/home/*/.local/lib/python3.6/site-packages/fastai/core.py”, line 276, in index_row
return a[idxs]
IndexError: index 0 is out of bounds for axis 0 with size 0

My dataframe has 1604 entries.

I tried changing the value within valid_idx from 1604 to the default example of 2000 in the documents, and got all sorts of different errors as well.