I have created a custom data set of Reddit submissions. During the processing of the Language Model one of the files has caused an error (see below) of “given sequence has an invalid size of dimension 2: 0”. Although I have tested that the text in my files are at least 35 characters long, its possible that this file has a problem such as just being one word - such as a URL.
I base this idea on finding the followng in the Pytorch Tensor.cpp on github…
THPUtils_assert(length > 0, "given sequence has an invalid size of "
"dimension %" PRId64 ": %" PRId64, (int64_t)sizes.size(), (int64_t)length);
I will check my data set, but finding this error I would love to be able to debug it - can anyone tell me how? Is there a way to debug a jupyter notebook trace?
And can anyone, perhaps @jeremy, confirm what they know would cause this error?
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-13-0e98c1d5fc20> in <module>()
1 FILES = dict(train=TRN_PATH, validation=VAL_PATH, test=VAL_PATH)
----> 2 md = LanguageModelData.from_text_files(PATH, TEXT, **FILES, bs=bs, bptt=bptt, min_freq=10)
~/fastai/courses/dl1/fastai/nlp.py in from_text_files(cls, path, field, train, validation, test, bs, bptt, **kwargs)
241 path, text_field=field, train=train, validation=validation, test=test)
242
--> 243 return cls(path, field, trn_ds, val_ds, test_ds, bs, bptt, **kwargs)
244
245
~/fastai/courses/dl1/fastai/nlp.py in __init__(self, path, field, trn_ds, val_ds, test_ds, bs, bptt, **kwargs)
220
221 self.trn_dl, self.val_dl, self.test_dl = [ LanguageModelLoader(ds, bs, bptt)
--> 222 for ds in (self.trn_ds, self.val_ds, self.test_ds) ]
223
224 def get_model(self, opt_fn, emb_sz, n_hid, n_layers, **kwargs):
~/fastai/courses/dl1/fastai/nlp.py in <listcomp>(.0)
220
221 self.trn_dl, self.val_dl, self.test_dl = [ LanguageModelLoader(ds, bs, bptt)
--> 222 for ds in (self.trn_ds, self.val_ds, self.test_ds) ]
223
224 def get_model(self, opt_fn, emb_sz, n_hid, n_layers, **kwargs):
~/fastai/courses/dl1/fastai/nlp.py in __init__(self, ds, bs, bptt)
132 text = sum([o.text for o in ds], [])
133 fld = ds.fields['text']
--> 134 nums = fld.numericalize([text])
135 self.data = self.batchify(nums)
136 self.i,self.iter = 0,0
~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torchtext/data/field.py in numericalize(self, arr, device, train)
296 arr = self.postprocessing(arr, None, train)
297
--> 298 arr = self.tensor_type(arr)
299 if self.sequential and not self.batch_first:
300 arr.t_()
RuntimeError: given sequence has an invalid size of dimension 2: 0