TabularPandas: TypeError: unhashable type: 'L' (fastai v2)

I am trying to use the tabular data example from the docs and I am getting a “TypeError: unhashable type: ‘L’”. I found this on github but it did not help me.

Can anyone here help?

I am in a jupyter notebook using python3, fastai v2 and fastcore version 1.0.0

My code:

def load_data():
    path = untar_data(URLs.ADULT_SAMPLE)
    df = pd.read_csv(path/'adult.csv')
    splits = RandomSplitter(valid_pct=0.2)(range_of(df))
    to = TabularPandas(df, procs=[Categorify, FillMissing,Normalize],
                   cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race'],
                   cont_names = ['age', 'fnlwgt', 'education-num'],
                   y_names='salary',
                   splits=splits)
    
    dls = to.dataloaders(bs=64)
    
    return dls

Stacktrace:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-30-81b156b45779> in <module>
----> 1 train_model(0.001, None, (0.95, 0.85, 0.95), 1)

<ipython-input-13-7934d407f75b> in train_model(lr, wd, moms, epochs)
      7 
      8     model = create_model()
----> 9     dls = load_data()
     10 
     11     since = time.time()

<ipython-input-29-f92cb6bd91ec> in load_data()
      7                    cont_names = ['age', 'fnlwgt', 'education-num'],
      8                    y_names='salary',
----> 9                    splits=splits)
     10 
     11     dls = to.dataloaders(bs=64)

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastai/tabular/core.py in __init__(self, df, procs, cat_names, cont_names, y_names, y_block, splits, do_setup, device, inplace, reduce_memory)
    153         if y_block is None and self.y_names:
    154             # Make ys categorical if they're not numeric
--> 155             ys = df[self.y_names]
    156             if len(ys.select_dtypes(include='number').columns)!=len(ys.columns): y_block = CategoryBlock()
    157             else: y_block = RegressionBlock()

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2686             return self._getitem_multilevel(key)
   2687         else:
-> 2688             return self._getitem_column(key)
   2689 
   2690     def _getitem_column(self, key):

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   2693         # get column
   2694         if self.columns.is_unique:
-> 2695             return self._get_item_cache(key)
   2696 
   2697         # duplicate columns & possible reduce dimensionality

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   2485         """Return the cached item, item represents a label indexer."""
   2486         cache = self._item_cache
-> 2487         res = cache.get(item)
   2488         if res is None:
   2489             values = self._data.get(item)

TypeError: unhashable type: 'L'

@ptats In general i cannot replicate, your function works fine for me. I believe that pandas cannot select from L (fastcore custom class extending python litsts). It does not recognise it as a listlike object and tries to apply _getitem_column. If this is the case it already got fixed quite some time ago (https://github.com/pandas-dev/pandas/pull/21313) but your stack looks like something before this change. Can you give pandas.__version__ ?

Also maybe you can make sure you are using latest fastai2 (/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastai/)

1 Like

I found the issue. It was the pandas version. Thanks @micstan!

Just in case anyone else finds this useful. I am trying to utilise Azure’s machine learning services. Simply installing fasta v2 is not enough. You need to set the pandas version to 1.0.5.

Hey there, I’m running Pandas 1.0.5 and experiencing this issue still.

Try updating your pandas and fastcore versions to the latest.

Hey! I have, it’s a brand new cluster on Databricks using everything new as of today.

I’m using the TextBlocks, so not Tabular: I’ve noticed the issue occurs when I try and use the tokenizer in TextBlock.from_df. If I tokenize my text beforehand then it works, but if I don’t it returns the L error.

Reverting to FastAI 1 for now

1 Like

I have the same issue, both with pandas 1.0.5 and with latest.

Same here

1 Like

I have exactly the same error for TextBlocks, have you found any solution?

1 Like

I’m experiencing this same error.

Data loaders code:

dls_lm = DataBlock(blocks=TextBlock.from_df('text', is_lm=True), 
                   splitter=RandomSplitter(0.1))

dls_lm.dataloaders(df, bs=128, seq_len=80)

Here is my stack trace:

/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-40-7a565d05bc8f> in <module>()
----> 1 dls_lm.dataloaders(df, bs=128, seq_len=80)

12 frames
/usr/local/lib/python3.6/dist-packages/fastai/text/data.py in <listcomp>(.0)
     46             self.o2i = defaultdict(int, {v:k for k,v in enumerate(self.vocab) if v != 'xxfake'})
     47 
---> 48     def encodes(self, o): return TensorText(tensor([self.o2i  [o_] for o_ in o]))
     49     def decodes(self, o): return L(self.vocab[o_] for o_ in o)
     50 

TypeError: unhashable type: 'L'

Here are my versions of fastai, fastcore, and pandas:

Fast.ai version: 2.2.5
Fastcore version: 1.3.19
Pandas version: 1.1.5

Any ideas on the issue?


EDIT: I was able to get this working using the following code. I think I was using it wrong or in an unintended way. Perhaps documentation issue. Note: I had to manually create an is_valid column with a boolean to tell ColSplitter how to create the validation set.

dls_lm = DataBlock(blocks=TextBlock.from_df('text', is_lm=True),
                   get_x=ColReader('text'),
                   splitter=ColSplitter()).dataloaders(df, bs=128, seq_len=80)
1 Like