Lesson 4: Error running LanguageModelData.from_text_files


(Mayug Maniparambil) #1

Hey,

The following line gave an error while running the notebook

TEXT = data.Field(lower=True, tokenize=spacy_tok)
FILES = dict(train=TRN_PATH, validation=VAL_PATH, test=VAL_PATH)
**md = LanguageModelData.from_text_files(PATH, TEXT, FILES, bs=bs, bptt=bptt, min_freq=10)

The error was
TypeError Traceback (most recent call last)
in ()
1 FILES = dict(train=TRN_PATH, validation=VAL_PATH, test=VAL_PATH)
----> 2 md = LanguageModelData.from_text_files(PATH, TEXT, **FILES, bs=bs, bptt=bptt, min_freq=10)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/fastai/nlp.py in from_text_files(cls, path, field, train, validation, test, bs, bptt, **kwargs)
309 “”"
310 trn_ds, val_ds, test_ds = ConcatTextDataset.splits(
–> 311 path, text_field=field, train=train, validation=validation, test=test)
312 return cls(path, field, trn_ds, val_ds, test_ds, bs, bptt, **kwargs)
313

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torchtext/data/dataset.py in splits(cls, path, root, train, validation, test, **kwargs)
74 path = cls.download(root)
75 train_data = None if train is None else cls(
—> 76 os.path.join(path, train), **kwargs)
77 val_data = None if validation is None else cls(
78 os.path.join(path, validation), **kwargs)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/fastai/nlp.py in init(self, path, text_field, newline_eos, encoding, **kwargs)
173 else: paths=[path]
174 for p in paths:
–> 175 for line in open(p, encoding=encoding): text += text_field.preprocess(line)
176 if newline_eos: text.append(’’)
177

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torchtext/data/field.py in preprocess(self, x)
168 x = self.tokenize(x.rstrip(’\n’))
169 if self.lower:
–> 170 x = Pipeline(six.text_type.lower)(x)
171 if self.preprocessing is not None:
172 return self.preprocessing(x)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torchtext/data/pipeline.py in call(self, x, *args)
35 “”"
36 for pipe in self.pipes:
—> 37 x = pipe.call(x, *args)
38 return x
39

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torchtext/data/pipeline.py in call(self, x, *args)
51 if isinstance(x, list):
52 return [self.convert_token(tok, *args) for tok in x]
—> 53 return self.convert_token(x, *args)
54
55 def add_before(self, pipeline):

TypeError: descriptor ‘lower’ requires a ‘str’ object but received a ‘spacy.tokens.doc.Doc’

Thank you,


(Francisco SB) #2

Hi! I solved it changing

TEXT = data.Field(lower=True, tokenize=spacy_tok)

for

TEXT = data.Field(lower=True, tokenize=“spacy”)

http://forums.fast.ai/t/name-spacy-tok-is-not-defined/14534/6

I hope it helps


(Mayug Maniparambil) #3

Thanks a lot. It works.