Hi,
I’m trying to use fastai’s text data module to load my text data for an NLP project. The data is stored in folders/subfolders where there is a train and validation folder, and inside each there are subfolders for each of my twenty classes, each containing close to 1k text files. When I try to use the from_folder function to create a TextClasDataBunch object, I get the following error:
ValueError: Invalid file path or buffer object type: <class 'NoneType'>
For reference here is the line of code where this problem occurs:
databunch = TextClasDataBunch.from_folder(path = path, valid = ‘valid’, train = ‘train’, tokenizer = data_tokenizer, shuffle = True)
data_tokenizer is a Tokenizer object with the tokenization function being SpacyTokenizer()
And here is the complete stack trace:
File "/Users/anprahlad/.pyenv/versions/venv/lib/python3.6/site-packages/fastai/text/data.py", line 345, in from_folder
classes=txt_kwargs.pop('classes', None), **txt_kwargs)
File "/Users/anprahlad/.pyenv/versions/venv/lib/python3.6/site-packages/fastai/text/data.py", line 199, in from_folder
return cls(folder, tokenizer, name=name, classes=classes, **kwargs)
File "/Users/anprahlad/.pyenv/versions/venv/lib/python3.6/site-packages/fastai/text/data.py", line 37, in __init__
if not self.check_toks(): self.tokenize()
File "/Users/anprahlad/.pyenv/versions/venv/lib/python3.6/site-packages/fastai/text/data.py", line 82, in tokenize
curr_len = get_chunk_length(self.df) if (self.create_mtd == TextMtd.DF) else get_chunk_length(self.csv_file, self.chunksize)
File "/Users/anprahlad/.pyenv/versions/venv/lib/python3.6/site-packages/fastai/core.py", line 125, in get_chunk_length
else: dfs = pd.read_csv(data, header=None, chunksize=chunksize)
File "/Users/anprahlad/.pyenv/versions/venv/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Users/anprahlad/.pyenv/versions/venv/lib/python3.6/site-packages/pandas/io/parsers.py", line 424, in _read
filepath_or_buffer, encoding, compression)
File "/Users/anprahlad/.pyenv/versions/venv/lib/python3.6/site-packages/pandas/io/common.py", line 218, in get_filepath_or_buffer
raise ValueError(msg.format(_type=type(filepath_or_buffer)))
ValueError: Invalid file path or buffer object type: <class 'NoneType'>
Has anyone else using the fastai text data library run into this issue? If so how did you get past this? Any advice would be greatly appreciated!