Getting an error building my LanguageModelData. I found the solution to this. I had my files in the folders test and train, but I didn’t specifically define my filename as “.txt”. Once I added that it fixed the problem. Also ignore the fact that all of these are currently pointing to “train/”. That was part of the debugging process. In reality I had validation and test both pointed at “test/”. Hopefully this helps somebody else if they have the same issue.
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-47-8ee4d2efe232> in <module>()
----> 1 md = LanguageModelData(PATH, TEXT, train="train/", validation="train/", test="train/")
~/fastaip1v2/fastai/courses/dl1/fastai/nlp.py in __init__(self, path, field, train, validation, test, bs, bptt, **kwargs)
196 self.nt = len(field.vocab)
197 self.trn_dl,self.val_dl,self.test_dl = [LanguageModelLoader(ds, bs, bptt) for ds in
--> 198 (self.trn_ds,self.val_ds,self.test_ds)]
199
200 def get_model(self, opt_fn, emb_sz, n_hid, n_layers, **kwargs):
~/fastaip1v2/fastai/courses/dl1/fastai/nlp.py in <listcomp>(.0)
195 self.pad_idx = field.vocab.stoi[field.pad_token]
196 self.nt = len(field.vocab)
--> 197 self.trn_dl,self.val_dl,self.test_dl = [LanguageModelLoader(ds, bs, bptt) for ds in
198 (self.trn_ds,self.val_ds,self.test_ds)]
199
~/fastaip1v2/fastai/courses/dl1/fastai/nlp.py in __init__(self, ds, bs, bptt)
132 text = sum([o.text for o in ds], [])
133 fld = ds.fields['text']
--> 134 nums = fld.numericalize([text])
135 self.data = self.batchify(nums)
136 self.i,self.iter = 0,0
~/anaconda3/envs/fastai/lib/python3.6/site-packages/torchtext/data/field.py in numericalize(self, arr, device, train)
296 arr = self.postprocessing(arr, None, train)
297
--> 298 arr = self.tensor_type(arr)
299 if self.sequential and not self.batch_first:
300 arr.t_()
RuntimeError: given sequence has an invalid size of dimension 2: 0