TextList from CSV -- IndexError: index 0 is out of bounds for axis 0 with size 0

aflip · January 4, 2020, 7:43pm

Hi,

I am doing the NLP videos and trying to make my own text classifier.

The test and valid data is in a single csv that I am trying to load using this code, which throws up the error as pasted below.

Folder structure: The CSV with the details is in this folder called lm-data
this folder also contains a bunch of other csvs that I am not using, and other data like the data bunch I created earlier.

    data_clas = (TextList.from_csv(path, 'ClassifierDataset_Trialrun_split.csv', cols='Text', vocab=data_lm.vocab)
                       .split_from_df(col='is_valid')
                       .label_from_df(cols='Specialty')
                       .databunch(bs=42))

the folder setup is

The error is:

    <ipython-input-125-253446505280> in <module>
      2 data_clas = (TextList.from_csv(path, 'ClassifierDataset_Trialrun_split.csv', cols='Text', vocab=data_lm.vocab)
      3                    .split_from_df(col='is_valid')
    ----> 4                    .label_from_df(cols='Specialty')
      5                    .databunch(bs=42))

    ~/anaconda3/lib/python3.7/site-packages/fastai/data_block.py in _inner(*args, **kwargs)
    473         assert isinstance(fv, Callable)
    474         def _inner(*args, **kwargs):
    --> 475             self.train = ft(*args, from_item_lists=True, **kwargs)
    476             assert isinstance(self.train, LabelList)
    477             kwargs['label_cls'] = self.train.y.__class__

    ~/anaconda3/lib/python3.7/site-packages/fastai/data_block.py in label_from_df(self, cols, label_cls, **kwargs)
    284             new_kwargs,label_cls = dict(one_hot=True, classes= cols),MultiCategoryList
    285             kwargs = {**new_kwargs, **kwargs}
    --> 286         return self._label_from_list(_maybe_squeeze(labels), label_cls=label_cls, **kwargs)
    287 
    288     def label_const(self, const:Any=0, label_cls:Callable=None, **kwargs)->'LabelList':

    ~/anaconda3/lib/python3.7/site-packages/fastai/data_block.py in _label_from_list(self, labels, label_cls, from_item_lists, **kwargs)
    272             raise Exception("Your data isn't split, if you don't want a validation set, please use `split_none`.")
    273         labels = array(labels, dtype=object)
    --> 274         label_cls = self.get_label_cls(labels, label_cls=label_cls, **kwargs)
    275         y = label_cls(labels, path=self.path, **kwargs)
    276         res = self._label_list(x=self, y=y)

    ~/anaconda3/lib/python3.7/site-packages/fastai/data_block.py in get_label_cls(self, labels, label_cls, label_delim, **kwargs)
    261         if self.label_cls is not None:          return self.label_cls
    262         if label_delim is not None:             return MultiCategoryList
    --> 263         it = index_row(labels,0)
    264         if isinstance(it, (float, np.float32)): return FloatList
    265         if isinstance(try_int(it), (str, Integral)):  return CategoryList

    ~/anaconda3/lib/python3.7/site-packages/fastai/core.py in index_row(a, idxs)
    274         if isinstance(res,(pd.DataFrame,pd.Series)): return res.copy()
    275         return res
    --> 276     return a[idxs]
    277 
    278 def func_args(func)->bool:

    IndexError: index 0 is out of bounds for axis 0 with size 0

=== Environment === 
platform      : Linux-4.15.0-1056-aws-x86_64-with-debian-buster-sid
distro        : #58-Ubuntu SMP Tue Nov 26 15:14:34 UTC 2019
conda env     : base
python        : /home/ubuntu/anaconda3/bin/python
sys.path      : /home/ubuntu
/home/ubuntu/anaconda3/lib/python37.zip
/home/ubuntu/anaconda3/lib/python3.7
/home/ubuntu/anaconda3/lib/python3.7/lib-dynload

/home/ubuntu/.local/lib/python3.7/site-packages
/home/ubuntu/anaconda3/lib/python3.7/site-packages
/home/ubuntu/anaconda3/lib/python3.7/site-packages/IPython/extensions
/home/ubuntu/.ipython

From searching in the forum I recognize this is a problem with file path, but am certain that the file paths are all ok.

I have tried making a new folder with just this csv in it, and that too returns this error.

All help deeply appreciated. I am new to python and programming, am a healthcare professional, hence basic troubleshooting is where I often get stuck

Thank you

aflip