AttributeError: 'str' object has no attribute 'paragraphs'

aardra · December 2, 2020, 10:55am

Hello. I am trying to create a language model from blogs. I am calling the Datasets API to build a language model from blogs.
dsets = Datasets(df_p, [tfms],dl_type=LMDataLoader)

df_p is a pandas dataframe that contains blogs. When we print:
print(df_p.head())
Result is:

0    ['12 years ago, the smartphone was in its infancy. Neither Airbnb nor Uber, Lyft or TaskRabbit yet existed. Saying the phrase ‘sharing economy’ usually resulted in raised eyebrows: was that some kind of hippie bartering, or a new take on philanthropy?', 'What’s happened since then is nothing short of extraordinary: today there are tens of thousands of self-professed sharing economy platforms worldwide, many featuring blistering valuations, wobbly IPOs, and fierce debates about the sharing economy’s benefits, pitfalls and potential. I’ve been at the fifty-yard line of this transformation, a...

Name: paragraphs, dtype: object

I am getting Attribute error:

AttributeError                            Traceback (most recent call last)
<ipython-input-28-bab5db2d3986> in <module>
----> 1 dsets = Datasets(df_p, [tfms],dl_type=LMDataLoader)

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/core.py in __init__(self, items, tfms, tls, n_inp, dl_type, **kwargs)
    308     def __init__(self, items=None, tfms=None, tls=None, n_inp=None, dl_type=None, **kwargs):
    309         super().__init__(dl_type=dl_type)
--> 310         self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])
    311         self.n_inp = ifnone(n_inp, max(1, len(self.tls)-1))
    312 

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/core.py in <listcomp>(.0)
    308     def __init__(self, items=None, tfms=None, tls=None, n_inp=None, dl_type=None, **kwargs):
    309         super().__init__(dl_type=dl_type)
--> 310         self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])
    311         self.n_inp = ifnone(n_inp, max(1, len(self.tls)-1))
    312 

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastcore/foundation.py in __call__(cls, x, *args, **kwargs)
    161     def __call__(cls, x=None, *args, **kwargs):
    162         if not args and not kwargs and x is not None and isinstance(x,cls): return x
--> 163         return super().__call__(x, *args, **kwargs)
    164 
    165 # Cell

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/core.py in __init__(self, items, tfms, use_list, do_setup, split_idx, train_setup, splits, types, verbose, dl_type)
    234         if do_setup:
    235             pv(f"Setting up {self.tfms}", verbose)
--> 236             self.setup(train_setup=train_setup)
    237 
    238     def _new(self, items, split_idx=None, **kwargs):

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/core.py in setup(self, train_setup)
    250 
    251     def setup(self, train_setup=True):
--> 252         self.tfms.setup(self, train_setup)
    253         if len(self) != 0:
    254             x = super().__getitem__(0) if self.splits is None else super().__getitem__(self.splits[0])[0]

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastcore/transform.py in setup(self, items, train_setup)
    190         tfms = self.fs[:]
    191         self.fs.clear()
--> 192         for t in tfms: self.add(t,items, train_setup)
    193 
    194     def add(self,t, items=None, train_setup=False):

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastcore/transform.py in add(self, t, items, train_setup)
    193 
    194     def add(self,t, items=None, train_setup=False):
--> 195         t.setup(items, train_setup)
    196         self.fs.append(t)
    197 

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastcore/transform.py in setup(self, items, train_setup)
     77     def setup(self, items=None, train_setup=False):
     78         train_setup = train_setup if self.train_setup is None else self.train_setup
---> 79         return self.setups(getattr(items, 'train', items) if train_setup else items)
     80 
     81     def _call(self, fn, x, split_idx=None, **kwargs):

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastcore/dispatch.py in __call__(self, *args, **kwargs)
    115         elif self.inst is not None: f = MethodType(f, self.inst)
    116         elif self.owner is not None: f = MethodType(f, self.owner)
--> 117         return f(*args, **kwargs)
    118 
    119     def __get__(self, inst, owner):

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/text/data.py in setups(self, dsets)
     40         if dsets is None: return
     41         if self.vocab is None:
---> 42             count = dsets.counter if getattr(dsets, 'counter', None) is not None else Counter(p for o in dsets for p in o)
     43             if self.special_toks is None and hasattr(dsets, 'special_toks'):
     44                 self.special_toks = dsets.special_toks

/opt/conda/envs/fastai/lib/python3.8/collections/__init__.py in __init__(self, iterable, **kwds)
    550         '''
    551         super(Counter, self).__init__()
--> 552         self.update(iterable, **kwds)
    553 
    554     def __missing__(self, key):

/opt/conda/envs/fastai/lib/python3.8/collections/__init__.py in update(self, iterable, **kwds)
    635                     super(Counter, self).update(iterable) # fast path when counter is empty
    636             else:
--> 637                 _count_elements(self, iterable)
    638         if kwds:
    639             self.update(kwds)

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/text/data.py in <genexpr>(.0)
     40         if dsets is None: return
     41         if self.vocab is None:
---> 42             count = dsets.counter if getattr(dsets, 'counter', None) is not None else Counter(p for o in dsets for p in o)
     43             if self.special_toks is None and hasattr(dsets, 'special_toks'):
     44                 self.special_toks = dsets.special_toks

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/core.py in <genexpr>(.0)
    242     def _after_item(self, o): return self.tfms(o)
    243     def __repr__(self): return f"{self.__class__.__name__}: {self.items}\ntfms - {self.tfms.fs}"
--> 244     def __iter__(self): return (self[i] for i in range(len(self)))
    245     def show(self, o, **kwargs): return self.tfms.show(o, **kwargs)
    246     def decode(self, o, **kwargs): return self.tfms.decode(o, **kwargs)

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/core.py in __getitem__(self, idx)
    278         res = super().__getitem__(idx)
    279         if self._after_item is None: return res
--> 280         return self._after_item(res) if is_indexer(idx) else res.map(self._after_item)
    281 
    282 # Cell

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/core.py in _after_item(self, o)
    240         return super()._new(items, tfms=self.tfms, do_setup=False, types=self.types, split_idx=split_idx, **kwargs)
    241     def subset(self, i): return self._new(self._get(self.splits[i]), split_idx=i)
--> 242     def _after_item(self, o): return self.tfms(o)
    243     def __repr__(self): return f"{self.__class__.__name__}: {self.items}\ntfms - {self.tfms.fs}"
    244     def __iter__(self): return (self[i] for i in range(len(self)))

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastcore/transform.py in __call__(self, o)
    196         self.fs.append(t)
    197 
--> 198     def __call__(self, o): return compose_tfms(o, tfms=self.fs, split_idx=self.split_idx)
    199     def __repr__(self): return f"Pipeline: {' -> '.join([f.name for f in self.fs if f.name != 'noop'])}"
    200     def __getitem__(self,i): return self.fs[i]

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastcore/transform.py in compose_tfms(x, tfms, is_enc, reverse, **kwargs)
    148     for f in tfms:
    149         if not is_enc: f = f.decode
--> 150         x = f(x, **kwargs)
    151     return x
    152 

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastcore/transform.py in __call__(self, x, **kwargs)
     71     @property
     72     def name(self): return getattr(self, '_name', _get_name(self))
---> 73     def __call__(self, x, **kwargs): return self._call('encodes', x, **kwargs)
     74     def decode  (self, x, **kwargs): return self._call('decodes', x, **kwargs)
     75     def __repr__(self): return f'{self.name}:\nencodes: {self.encodes}decodes: {self.decodes}'

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastcore/transform.py in _call(self, fn, x, split_idx, **kwargs)
     81     def _call(self, fn, x, split_idx=None, **kwargs):
     82         if split_idx!=self.split_idx and self.split_idx is not None: return x
---> 83         return self._do_call(getattr(self, fn), x, **kwargs)
     84 
     85     def _do_call(self, f, x, **kwargs):

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastcore/transform.py in _do_call(self, f, x, **kwargs)
     87             if f is None: return x
     88             ret = f.returns_none(x) if hasattr(f,'returns_none') else None
---> 89             return retain_type(f(x, **kwargs), x, ret)
     90         res = tuple(self._do_call(f, x_, **kwargs) for x_ in x)
     91         return retain_type(res, x)

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastcore/dispatch.py in __call__(self, *args, **kwargs)
    115         elif self.inst is not None: f = MethodType(f, self.inst)
    116         elif self.owner is not None: f = MethodType(f, self.owner)
--> 117         return f(*args, **kwargs)
    118 
    119     def __get__(self, inst, owner):

AttributeError: 'str' object has no attribute 'paragraphs'

I was trying to follow the wikitext tutorial here: https://docs.fast.ai/tutorial.wikitext.html
which also uses a similar dataset from IMDB.

stefan-ai · December 2, 2020, 11:45am

I personally found it easier to use this format for creating your LM dataloaders from a df:

dblock_lm = DataBlock(blocks=(TextBlock.from_df('text', seq_len=sl, is_lm=True)),
                      get_x=ColReader('text'),
                      splitter=RandomSplitter(0.2))
dls_lm = dblock_lm.dataloaders(df, bs=bs, seq_len=sl)

Note that in this example the column containing your texts in the df needs to be named “text”.

aardra · December 2, 2020, 11:54am

stefan-ai:

dblock_lm = DataBlock(blocks=(TextBlock.from_df('text', seq_len=sl, is_lm=True)),
                      get_x=ColReader('text'),
                      splitter=RandomSplitter(0.2))
dls_lm = dblock_lm.dataloaders(df, bs=bs, seq_len=sl)

That returns this error:

KeyError                                  Traceback (most recent call last)
/opt/conda/envs/fastai/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2888             try:
-> 2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'is_valid'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-16-8f53bfe6ae4a> in <module>
----> 1 dls_lm = startup_lm.dataloaders(df_all, bs=bs, seq_len=sl)

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/block.py in dataloaders(self, source, path, verbose, **kwargs)
    111 
    112     def dataloaders(self, source, path='.', verbose=False, **kwargs):
--> 113         dsets = self.datasets(source)
    114         kwargs = {**self.dls_kwargs, **kwargs, 'verbose': verbose}
    115         return dsets.dataloaders(path=path, after_item=self.item_tfms, after_batch=self.batch_tfms, **kwargs)

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/block.py in datasets(self, source, verbose)
    106         self.source = source                     ; pv(f"Collecting items from {source}", verbose)
    107         items = (self.get_items or noop)(source) ; pv(f"Found {len(items)} items", verbose)
--> 108         splits = (self.splitter or RandomSplitter())(items)
    109         pv(f"{len(splits)} datasets of sizes {','.join([str(len(s)) for s in splits])}", verbose)
    110         return Datasets(items, tfms=self._combine_type_tfms(), splits=splits, dl_type=self.dl_type, n_inp=self.n_inp, verbose=verbose)

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/transforms.py in _inner(o)
    148     def _inner(o):
    149         assert isinstance(o, pd.DataFrame), "ColSplitter only works when your items are a pandas DataFrame"
--> 150         valid_idx = (o.iloc[:,col] if isinstance(col, int) else o[col]).values.astype('bool')
    151         return IndexSplitter(mask2idxs(valid_idx))(o)
    152     return _inner

/opt/conda/envs/fastai/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2897             if self.columns.nlevels > 1:
   2898                 return self._getitem_multilevel(key)
-> 2899             indexer = self.columns.get_loc(key)
   2900             if is_integer(indexer):
   2901                 indexer = [indexer]

/opt/conda/envs/fastai/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:
-> 2891                 raise KeyError(key) from err
   2892 
   2893         if tolerance is not None:

KeyError: 'is_valid'

stefan-ai · December 2, 2020, 12:29pm

Could you post your DataBlock code and dataframe columns?

aardra · December 2, 2020, 12:54pm

Here you go: Thanks in advance!
df_train.columns =['date','title','subtitle','claps','responses','author_url','story_url','reading_time (mins)','number_sections','section_titles','number_paragraphs','paragraphs']

df_train['new_paragraphs'] = df_train.paragraphs.apply(lambda x: str(x).strip('[]'))
df_test['new_paragraphs'] = df_test.paragraphs.apply(lambda x: str(x).strip('[]'))

df_all = pd.concat([df_train,df_test])
startup_lm = DataBlock(blocks=TextBlock.from_df('new_paragraphs',is_lm=True),get_x=ColReader('new_paragraphs'),splitter=ColSplitter())
bs,sl = 104,72
dls_lm = startup_lm.dataloaders(df_all, bs=bs, seq_len=sl)
KeyError: 'is_valid'

stefan-ai · December 2, 2020, 1:49pm

When you use ColSplitter, you need to include a column in the dataframe that indicates if a record is part of the training or validation set. By default, fastai assumes there is a column named is_valid that is either True or False. In your case, fastai throws an error because it cannot find that column.

Try this before you use pd.concat:

df_train['is_valid'] = False
df_test['is_valid'] = True

aardra · December 2, 2020, 2:44pm

Thanks again for the solution. I seems to work and data loads up to 100% and then shows this error:
After running this:
dls_lm = startup_lm.dataloaders(df_all, bs=bs, seq_len=sl)

I get this error:

AttributeError                            Traceback (most recent call last)
<ipython-input-10-8f53bfe6ae4a> in <module>
----> 1 dls_lm = startup_lm.dataloaders(df_all, bs=bs, seq_len=sl)

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/block.py in dataloaders(self, source, path, verbose, **kwargs)
    111 
    112     def dataloaders(self, source, path='.', verbose=False, **kwargs):
--> 113         dsets = self.datasets(source)
    114         kwargs = {**self.dls_kwargs, **kwargs, 'verbose': verbose}
    115         return dsets.dataloaders(path=path, after_item=self.item_tfms, after_batch=self.batch_tfms, **kwargs)

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/block.py in datasets(self, source, verbose)
    108         splits = (self.splitter or RandomSplitter())(items)
    109         pv(f"{len(splits)} datasets of sizes {','.join([str(len(s)) for s in splits])}", verbose)
--> 110         return Datasets(items, tfms=self._combine_type_tfms(), splits=splits, dl_type=self.dl_type, n_inp=self.n_inp, verbose=verbose)
    111 
    112     def dataloaders(self, source, path='.', verbose=False, **kwargs):

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/core.py in __init__(self, items, tfms, tls, n_inp, dl_type, **kwargs)
    308     def __init__(self, items=None, tfms=None, tls=None, n_inp=None, dl_type=None, **kwargs):
    309         super().__init__(dl_type=dl_type)
--> 310         self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])
    311         self.n_inp = ifnone(n_inp, max(1, len(self.tls)-1))
    312 

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/core.py in <listcomp>(.0)
    308     def __init__(self, items=None, tfms=None, tls=None, n_inp=None, dl_type=None, **kwargs):
    309         super().__init__(dl_type=dl_type)
--> 310         self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])
    311         self.n_inp = ifnone(n_inp, max(1, len(self.tls)-1))
    312 

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastcore/foundation.py in __call__(cls, x, *args, **kwargs)
    161     def __call__(cls, x=None, *args, **kwargs):
    162         if not args and not kwargs and x is not None and isinstance(x,cls): return x
--> 163         return super().__call__(x, *args, **kwargs)
    164 
    165 # Cell

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/core.py in __init__(self, items, tfms, use_list, do_setup, split_idx, train_setup, splits, types, verbose, dl_type)
    234         if do_setup:
    235             pv(f"Setting up {self.tfms}", verbose)
--> 236             self.setup(train_setup=train_setup)
    237 
    238     def _new(self, items, split_idx=None, **kwargs):

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/core.py in setup(self, train_setup)
    256             for f in self.tfms.fs:
    257                 self.types.append(getattr(f, 'input_types', type(x)))
--> 258                 x = f(x)
    259             self.types.append(type(x))
    260         types = L(t if is_listy(t) else [t] for t in self.types).concat().unique()

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/transforms.py in __call__(self, o, **kwargs)
    198 
    199     def __call__(self, o, **kwargs):
--> 200         if len(self.cols) == 1: return self._do_one(o, self.cols[0])
    201         return L(self._do_one(o, c) for c in self.cols)
    202 

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/transforms.py in _do_one(self, r, c)
    192 
    193     def _do_one(self, r, c):
--> 194         o = r[c] if isinstance(c, int) else r[c] if c=='name' else getattr(r, c)
    195         if len(self.pref)==0 and len(self.suff)==0 and self.label_delim is None: return o
    196         if self.label_delim is None: return f'{self.pref}{o}{self.suff}'

/opt/conda/envs/fastai/lib/python3.8/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5128             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5129                 return self[name]
-> 5130             return object.__getattribute__(self, name)
   5131 
   5132     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'paragraphs'

stefan-ai · December 2, 2020, 2:52pm

Isn’t there some naming inconsistency? Because in your code above your column is called new_paragraphs, while this error message mentions paragraphs.

aardra · December 3, 2020, 4:14am

So I changed it all to ‘paragraphs’, I had changed it to new_paragraphs, only because I thought the error was coming because the text is in a list, while in the wikitext tutorial, it is not a list. But the error is the same regardless.

This is the code I used to create new_paragraphs:
df_train[‘new_paragraphs’] = df_train.paragraphs.apply(lambda x: str(x).strip(’[]’))
df_test[‘new_paragraphs’] = df_test.paragraphs.apply(lambda x: str(x).strip(’[]’))

stefan-ai · December 3, 2020, 8:19am

Not sure what’s still causing the error. I guess there is something wrong with your dataframe, but it’s hard to tell without seeing the dataframe and your exact code that produces the error.

aardra · December 7, 2020, 5:29am

Hi. I have attached two screenshots related to the dataframe in question.
The first screenshot is the output of the command.

df_train.head()

The second one is the output of the command:
df_train['paragraphs']

In between, I thought the error had to do with the text in df_train[‘paragraphs’] being in lists. So I removed the list by running this command
df_train['new_paragraphs'] = df_train.paragraphs.apply(lambda x: str(x).strip('[]'))
df_test['new_paragraphs'] = df_test.paragraphs.apply(lambda x: str(x).strip('[]'))

The output of df_train['new_paragraphs] is the third screenshot.

Any hints would be very helpful. Thanks again

muellerzr · December 7, 2020, 6:28am

Your issue (which this should show in the docs too) is TextBlock has a parameter of res_col_name. What that column is is your tokenized paragraphs. By default it will export them to a column called “text”, so your get_x should be a ColReader(“text”)