Given sequence has an invalid size of dimension 2: 0

Getting an error building my LanguageModelData. I found the solution to this. I had my files in the folders test and train, but I didn’t specifically define my filename as “.txt”. Once I added that it fixed the problem. Also ignore the fact that all of these are currently pointing to “train/”. That was part of the debugging process. In reality I had validation and test both pointed at “test/”. Hopefully this helps somebody else if they have the same issue.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-47-8ee4d2efe232> in <module>()
----> 1 md = LanguageModelData(PATH, TEXT, train="train/", validation="train/", test="train/")

~/fastaip1v2/fastai/courses/dl1/fastai/nlp.py in __init__(self, path, field, train, validation, test, bs, bptt, **kwargs)
    196         self.nt = len(field.vocab)
    197         self.trn_dl,self.val_dl,self.test_dl = [LanguageModelLoader(ds, bs, bptt) for ds in
--> 198                                                (self.trn_ds,self.val_ds,self.test_ds)]
    199 
    200     def get_model(self, opt_fn, emb_sz, n_hid, n_layers, **kwargs):

~/fastaip1v2/fastai/courses/dl1/fastai/nlp.py in <listcomp>(.0)
    195         self.pad_idx = field.vocab.stoi[field.pad_token]
    196         self.nt = len(field.vocab)
--> 197         self.trn_dl,self.val_dl,self.test_dl = [LanguageModelLoader(ds, bs, bptt) for ds in
    198                                                (self.trn_ds,self.val_ds,self.test_ds)]
    199 

~/fastaip1v2/fastai/courses/dl1/fastai/nlp.py in __init__(self, ds, bs, bptt)
    132         text = sum([o.text for o in ds], [])
    133         fld = ds.fields['text']
--> 134         nums = fld.numericalize([text])
    135         self.data = self.batchify(nums)
    136         self.i,self.iter = 0,0

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torchtext/data/field.py in numericalize(self, arr, device, train)
    296                 arr = self.postprocessing(arr, None, train)
    297 
--> 298         arr = self.tensor_type(arr)
    299         if self.sequential and not self.batch_first:
    300             arr.t_()

RuntimeError: given sequence has an invalid size of dimension 2: 0

I’m also running into this, but in my case it’s because I haven’t figured out where to get the data from.

Right now I’m just running through notebooks to see if they compile + fixing obvious problems, so maybe this’ll be clearer when I fully get to this point in the lessons!

Hi @KevinB
I am having the same problem. Where did you specify file name as “.txt”.

Regards
Zubair

@zubair1.shah and @KevinB - did you ever figure out where to specify the file name?

It’s been a while since I had this issue. I think the problem was I was using files named “file1”, “file2”, and “file3” instead of “file1.txt”, “file2.txt”, and “file3.txt”

The problem may have something to do with Kevin using the nlp.py module back in the day. That has since been deprecated in favor of the text.py module.

So make sure you are using the text package and also reference lesson 10 as the more definitive notebook … and let us know if you get things worked out.

2 Likes

Thanks! I started playing with the new text package and it seems to have resolved.