TabularPandas: TypeError: unhashable type: 'L' (fastai v2)

ptats · August 24, 2020, 6:34am

I am trying to use the tabular data example from the docs and I am getting a “TypeError: unhashable type: ‘L’”. I found this on github but it did not help me.

Can anyone here help?

I am in a jupyter notebook using python3, fastai v2 and fastcore version 1.0.0

My code:

def load_data():
    path = untar_data(URLs.ADULT_SAMPLE)
    df = pd.read_csv(path/'adult.csv')
    splits = RandomSplitter(valid_pct=0.2)(range_of(df))
    to = TabularPandas(df, procs=[Categorify, FillMissing,Normalize],
                   cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race'],
                   cont_names = ['age', 'fnlwgt', 'education-num'],
                   y_names='salary',
                   splits=splits)
    
    dls = to.dataloaders(bs=64)
    
    return dls

Stacktrace:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-30-81b156b45779> in <module>
----> 1 train_model(0.001, None, (0.95, 0.85, 0.95), 1)

<ipython-input-13-7934d407f75b> in train_model(lr, wd, moms, epochs)
      7 
      8     model = create_model()
----> 9     dls = load_data()
     10 
     11     since = time.time()

<ipython-input-29-f92cb6bd91ec> in load_data()
      7                    cont_names = ['age', 'fnlwgt', 'education-num'],
      8                    y_names='salary',
----> 9                    splits=splits)
     10 
     11     dls = to.dataloaders(bs=64)

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastai/tabular/core.py in __init__(self, df, procs, cat_names, cont_names, y_names, y_block, splits, do_setup, device, inplace, reduce_memory)
    153         if y_block is None and self.y_names:
    154             # Make ys categorical if they're not numeric
--> 155             ys = df[self.y_names]
    156             if len(ys.select_dtypes(include='number').columns)!=len(ys.columns): y_block = CategoryBlock()
    157             else: y_block = RegressionBlock()

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2686             return self._getitem_multilevel(key)
   2687         else:
-> 2688             return self._getitem_column(key)
   2689 
   2690     def _getitem_column(self, key):

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   2693         # get column
   2694         if self.columns.is_unique:
-> 2695             return self._get_item_cache(key)
   2696 
   2697         # duplicate columns & possible reduce dimensionality

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   2485         """Return the cached item, item represents a label indexer."""
   2486         cache = self._item_cache
-> 2487         res = cache.get(item)
   2488         if res is None:
   2489             values = self._data.get(item)

TypeError: unhashable type: 'L'

micstan · August 24, 2020, 11:59am

@ptats In general i cannot replicate, your function works fine for me. I believe that pandas cannot select from L (fastcore custom class extending python litsts). It does not recognise it as a listlike object and tries to apply _getitem_column. If this is the case it already got fixed quite some time ago (https://github.com/pandas-dev/pandas/pull/21313) but your stack looks like something before this change. Can you give pandas.__version__ ?

Also maybe you can make sure you are using latest fastai2 (/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastai/)

ptats · August 25, 2020, 12:35am

I found the issue. It was the pandas version. Thanks @micstan!

ptats · August 25, 2020, 1:47am

Just in case anyone else finds this useful. I am trying to utilise Azure’s machine learning services. Simply installing fasta v2 is not enough. You need to set the pandas version to 1.0.5.

hemm · September 3, 2020, 12:56pm

Hey there, I’m running Pandas 1.0.5 and experiencing this issue still.

ptats · September 3, 2020, 1:26pm

Try updating your pandas and fastcore versions to the latest.

hemm · September 3, 2020, 1:30pm

Hey! I have, it’s a brand new cluster on Databricks using everything new as of today.

I’m using the TextBlocks, so not Tabular: I’ve noticed the issue occurs when I try and use the tokenizer in TextBlock.from_df. If I tokenize my text beforehand then it works, but if I don’t it returns the L error.

Reverting to FastAI 1 for now

adamh · September 9, 2020, 12:35pm

I have the same issue, both with pandas 1.0.5 and with latest.

armatav · October 16, 2020, 8:10pm

Same here

meysa · December 4, 2020, 12:41am

I have exactly the same error for TextBlocks, have you found any solution?

robertritz · February 17, 2021, 1:25pm

I’m experiencing this same error.

Data loaders code:

dls_lm = DataBlock(blocks=TextBlock.from_df('text', is_lm=True), 
                   splitter=RandomSplitter(0.1))

dls_lm.dataloaders(df, bs=128, seq_len=80)

Here is my stack trace:

/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-40-7a565d05bc8f> in <module>()
----> 1 dls_lm.dataloaders(df, bs=128, seq_len=80)

12 frames
/usr/local/lib/python3.6/dist-packages/fastai/text/data.py in <listcomp>(.0)
     46             self.o2i = defaultdict(int, {v:k for k,v in enumerate(self.vocab) if v != 'xxfake'})
     47 
---> 48     def encodes(self, o): return TensorText(tensor([self.o2i  [o_] for o_ in o]))
     49     def decodes(self, o): return L(self.vocab[o_] for o_ in o)
     50 

TypeError: unhashable type: 'L'

Here are my versions of fastai, fastcore, and pandas:

Fast.ai version: 2.2.5
Fastcore version: 1.3.19
Pandas version: 1.1.5

Any ideas on the issue?

EDIT: I was able to get this working using the following code. I think I was using it wrong or in an unintended way. Perhaps documentation issue. Note: I had to manually create an is_valid column with a boolean to tell ColSplitter how to create the validation set.

dls_lm = DataBlock(blocks=TextBlock.from_df('text', is_lm=True),
                   get_x=ColReader('text'),
                   splitter=ColSplitter()).dataloaders(df, bs=128, seq_len=80)