NLP for single-word inputs like names (indexer error)

burninator · January 6, 2021, 3:27am

Hi, I’ve been using code that’s pretty much exactly from the documentation here https://docs.fast.ai/tutorial.datablock.html#Text.

I’m using a csv file, each line has a word or short phrase, followed by a country, and then is_valid=true ( I don’t have any invalid data… is this required? My apologies, I’m new to AI). fast.ai loads the csv and previews fine (see screenshots). Then I get an indexer error on the dataloader call:

IndexError: single positional indexer is out-of-bounds

Here are some screenshots and a stack trace. I’ve tried a few difference sequence lengths and batch sizes (none of the names are longer than 100 for sure). Anything else I can try to get dataloader to accept the data?

Thank you for your help!

/opt/conda/envs/fastai/lib/python3.8/site-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify ‘dtype=object’ when creating the ndarray
return array(a, dtype, copy=False, order=order)

IndexError Traceback (most recent call last)
in
5 splitter=ColSplitter())
6
----> 7 dls = imdb_lm.dataloaders(df, bs=10, seq_len=100)
8 dls.show_batch(max_n=6)
9 “”"

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/block.py in dataloaders(self, source, path, verbose, **kwargs)
111
112 def dataloaders(self, source, path=’.’, verbose=False, **kwargs):
–> 113 dsets = self.datasets(source, verbose=verbose)
114 kwargs = {**self.dls_kwargs, **kwargs, ‘verbose’: verbose}
115 return dsets.dataloaders(path=path, after_item=self.item_tfms, after_batch=self.batch_tfms, **kwargs)

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/block.py in datasets(self, source, verbose)
108 splits = (self.splitter or RandomSplitter())(items)
109 pv(f"{len(splits)} datasets of sizes {’,’.join([str(len(s)) for s in splits])}", verbose)
–> 110 return Datasets(items, tfms=self._combine_type_tfms(), splits=splits, dl_type=self.dl_type, n_inp=self.n_inp, verbose=verbose)
111
112 def dataloaders(self, source, path=’.’, verbose=False, **kwargs):

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/core.py in init(self, items, tfms, tls, n_inp, dl_type, **kwargs)
308 def init(self, items=None, tfms=None, tls=None, n_inp=None, dl_type=None, **kwargs):
309 super().init(dl_type=dl_type)
–> 310 self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])
311 self.n_inp = ifnone(n_inp, max(1, len(self.tls)-1))
312

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/core.py in (.0)
308 def init(self, items=None, tfms=None, tls=None, n_inp=None, dl_type=None, **kwargs):
309 super().init(dl_type=dl_type)
–> 310 self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])
311 self.n_inp = ifnone(n_inp, max(1, len(self.tls)-1))
312

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastcore/foundation.py in call(cls, x, *args, **kwargs)
95 def call(cls, x=None, *args, **kwargs):
96 if not args and not kwargs and x is not None and isinstance(x,cls): return x
—> 97 return super().call(x, *args, **kwargs)
98
99 # Cell

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/core.py in init(self, items, tfms, use_list, do_setup, split_idx, train_setup, splits, types, verbose, dl_type)
234 if do_setup:
235 pv(f"Setting up {self.tfms}", verbose)
–> 236 self.setup(train_setup=train_setup)
237
238 def _new(self, items, split_idx=None, **kwargs):

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/core.py in setup(self, train_setup)
252 self.tfms.setup(self, train_setup)
253 if len(self) != 0:
–> 254 x = super().getitem(0) if self.splits is None else super().getitem(self.splits[0])[0]
255 self.types = []
256 for f in self.tfms.fs:

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastcore/foundation.py in getitem(self, idx)
109 def _xtra(self): return None
110 def _new(self, items, *args, **kwargs): return type(self)(items, *args, use_list=None, **kwargs)
–> 111 def getitem(self, idx): return self._get(idx) if is_indexer(idx) else L(self._get(idx), use_list=None)
112 def copy(self): return self._new(self.items.copy())
113

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastcore/foundation.py in _get(self, i)
113
114 def _get(self, i):
–> 115 if is_indexer(i) or isinstance(i,slice): return getattr(self.items,‘iloc’,self.items)[i]
116 i = mask2idxs(i)
117 return (self.items.iloc[list(i)] if hasattr(self.items,‘iloc’)

/opt/conda/envs/fastai/lib/python3.8/site-packages/pandas/core/indexing.py in getitem(self, key)
877
878 maybe_callable = com.apply_if_callable(key, self.obj)
–> 879 return self._getitem_axis(maybe_callable, axis=axis)
880
881 def _is_scalar_access(self, key: Tuple):

/opt/conda/envs/fastai/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
1494
1495 # validate the location
-> 1496 self._validate_integer(key, axis)
1497
1498 return self.obj._ixs(key, axis=axis)

/opt/conda/envs/fastai/lib/python3.8/site-packages/pandas/core/indexing.py in _validate_integer(self, key, axis)
1435 len_axis = len(self.obj._get_axis(axis))
1436 if key >= len_axis or key < -len_axis:
-> 1437 raise IndexError(“single positional indexer is out-of-bounds”)
1438
1439 # -------------------------------------------------------------------

IndexError: single positional indexer is out-of-bounds

darek.kleczek · January 6, 2021, 7:03am

Hello!
is_valid argument indicates whether a particular line belongs to validation set (vs. training set) - so if you indicate all texts are in validation, then zero texts are in training and maybe that is leading to your error…
Side note: looking at your dataset snapshot in the screenshots, it’s unlikely that the default fastai text model will work well for your dataset. I’d recommend watching the first few lectures on course.fast.ai to understand the possible solutions
Cheers, Darek

burninator · January 11, 2021, 1:15am

Ah, that’s what it was, I set 20% of my data to be validation data and it processed it. But you’re right, even though it’s loading without error now I don’t know how well the default text will be able to learn it. I’ll do a little research. Thanks for your help!