Hi,
I am doing the NLP videos and trying to make my own text classifier.
The test and valid data is in a single csv that I am trying to load using this code, which throws up the error as pasted below.
Folder structure: The CSV with the details is in this folder called lm-data
this folder also contains a bunch of other csvs that I am not using, and other data like the data bunch I created earlier.
data_clas = (TextList.from_csv(path, 'ClassifierDataset_Trialrun_split.csv', cols='Text', vocab=data_lm.vocab)
.split_from_df(col='is_valid')
.label_from_df(cols='Specialty')
.databunch(bs=42))
the folder setup is
The error is:
<ipython-input-125-253446505280> in <module>
2 data_clas = (TextList.from_csv(path, 'ClassifierDataset_Trialrun_split.csv', cols='Text', vocab=data_lm.vocab)
3 .split_from_df(col='is_valid')
----> 4 .label_from_df(cols='Specialty')
5 .databunch(bs=42))
~/anaconda3/lib/python3.7/site-packages/fastai/data_block.py in _inner(*args, **kwargs)
473 assert isinstance(fv, Callable)
474 def _inner(*args, **kwargs):
--> 475 self.train = ft(*args, from_item_lists=True, **kwargs)
476 assert isinstance(self.train, LabelList)
477 kwargs['label_cls'] = self.train.y.__class__
~/anaconda3/lib/python3.7/site-packages/fastai/data_block.py in label_from_df(self, cols, label_cls, **kwargs)
284 new_kwargs,label_cls = dict(one_hot=True, classes= cols),MultiCategoryList
285 kwargs = {**new_kwargs, **kwargs}
--> 286 return self._label_from_list(_maybe_squeeze(labels), label_cls=label_cls, **kwargs)
287
288 def label_const(self, const:Any=0, label_cls:Callable=None, **kwargs)->'LabelList':
~/anaconda3/lib/python3.7/site-packages/fastai/data_block.py in _label_from_list(self, labels, label_cls, from_item_lists, **kwargs)
272 raise Exception("Your data isn't split, if you don't want a validation set, please use `split_none`.")
273 labels = array(labels, dtype=object)
--> 274 label_cls = self.get_label_cls(labels, label_cls=label_cls, **kwargs)
275 y = label_cls(labels, path=self.path, **kwargs)
276 res = self._label_list(x=self, y=y)
~/anaconda3/lib/python3.7/site-packages/fastai/data_block.py in get_label_cls(self, labels, label_cls, label_delim, **kwargs)
261 if self.label_cls is not None: return self.label_cls
262 if label_delim is not None: return MultiCategoryList
--> 263 it = index_row(labels,0)
264 if isinstance(it, (float, np.float32)): return FloatList
265 if isinstance(try_int(it), (str, Integral)): return CategoryList
~/anaconda3/lib/python3.7/site-packages/fastai/core.py in index_row(a, idxs)
274 if isinstance(res,(pd.DataFrame,pd.Series)): return res.copy()
275 return res
--> 276 return a[idxs]
277
278 def func_args(func)->bool:
IndexError: index 0 is out of bounds for axis 0 with size 0
=== Environment ===
platform : Linux-4.15.0-1056-aws-x86_64-with-debian-buster-sid
distro : #58-Ubuntu SMP Tue Nov 26 15:14:34 UTC 2019
conda env : base
python : /home/ubuntu/anaconda3/bin/python
sys.path : /home/ubuntu
/home/ubuntu/anaconda3/lib/python37.zip
/home/ubuntu/anaconda3/lib/python3.7
/home/ubuntu/anaconda3/lib/python3.7/lib-dynload
/home/ubuntu/.local/lib/python3.7/site-packages
/home/ubuntu/anaconda3/lib/python3.7/site-packages
/home/ubuntu/anaconda3/lib/python3.7/site-packages/IPython/extensions
/home/ubuntu/.ipython
From searching in the forum I recognize this is a problem with file path, but am certain that the file paths are all ok.
I have tried making a new folder with just this csv in it, and that too returns this error.
All help deeply appreciated. I am new to python and programming, am a healthcare professional, hence basic troubleshooting is where I often get stuck
Thank you
aflip