I was reading the documentation about text processing, and trying to run the toy example provided (IMDB).
The dataset is correctly downloaded, untarred, and shown as pandas dataframe.
But as I try to instantiate the databunch for the language model, it searches for train.csv
in the same dir where it untarred the sample. But no train.csv
is present:
data_lm = TextLMDataBunch.from_csv(path, 'texts.csv')
produces:
FileNotFoundError: File b'/home/poko/.fastai/data/imdb_sample/train.csv' does not exist
In that directory, only texts.csv
does actually exist.
Note that I’m following the tutorial step by step here.
Should we split texts.csv
manually?