Lesson 4: Error running LanguageModelData.from_text_files


The following line gave an error while running the notebook

TEXT = data.Field(lower=True, tokenize=spacy_tok)
FILES = dict(train=TRN_PATH, validation=VAL_PATH, test=VAL_PATH)
**md = LanguageModelData.from_text_files(PATH, TEXT, FILES, bs=bs, bptt=bptt, min_freq=10)

The error was
TypeError Traceback (most recent call last)
in ()
1 FILES = dict(train=TRN_PATH, validation=VAL_PATH, test=VAL_PATH)
----> 2 md = LanguageModelData.from_text_files(PATH, TEXT, **FILES, bs=bs, bptt=bptt, min_freq=10)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/fastai/nlp.py in from_text_files(cls, path, field, train, validation, test, bs, bptt, **kwargs)
309 β€œβ€"
310 trn_ds, val_ds, test_ds = ConcatTextDataset.splits(
–> 311 path, text_field=field, train=train, validation=validation, test=test)
312 return cls(path, field, trn_ds, val_ds, test_ds, bs, bptt, **kwargs)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torchtext/data/dataset.py in splits(cls, path, root, train, validation, test, **kwargs)
74 path = cls.download(root)
75 train_data = None if train is None else cls(
β€”> 76 os.path.join(path, train), **kwargs)
77 val_data = None if validation is None else cls(
78 os.path.join(path, validation), **kwargs)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/fastai/nlp.py in init(self, path, text_field, newline_eos, encoding, **kwargs)
173 else: paths=[path]
174 for p in paths:
–> 175 for line in open(p, encoding=encoding): text += text_field.preprocess(line)
176 if newline_eos: text.append(’’)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torchtext/data/field.py in preprocess(self, x)
168 x = self.tokenize(x.rstrip(’\n’))
169 if self.lower:
–> 170 x = Pipeline(six.text_type.lower)(x)
171 if self.preprocessing is not None:
172 return self.preprocessing(x)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torchtext/data/pipeline.py in call(self, x, *args)
35 β€œβ€"
36 for pipe in self.pipes:
β€”> 37 x = pipe.call(x, *args)
38 return x

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torchtext/data/pipeline.py in call(self, x, *args)
51 if isinstance(x, list):
52 return [self.convert_token(tok, *args) for tok in x]
β€”> 53 return self.convert_token(x, *args)
55 def add_before(self, pipeline):

TypeError: descriptor β€˜lower’ requires a β€˜str’ object but received a β€˜spacy.tokens.doc.Doc’

Thank you,

Hi! I solved it changing

TEXT = data.Field(lower=True, tokenize=spacy_tok)


TEXT = data.Field(lower=True, tokenize=β€œspacy”)


I hope it helps


Thanks a lot. It works.