Run time error while doing text classification

I have trained a language model successfully with the dataset that I have; now when I want to do text classification, it gives me the following error when I run model.fit :

RuntimeError: given sequence has an invalid size of dimension 2: 0

my data is in an appropriate format, I followed the lang_model-arxiv.ipynb notebook to format the data. my annotated dataset though for text classification is very small (400 annotated texts total), I doubt that small size of data causing the problem.

I have seen in this forum that some other people also have faced this error, but seems that still, no one explained why it happens and how to avoid this.

the error trace is as follow:

RuntimeError Traceback (most recent call last)
in ()
1 m3.freeze_to(-1)
----> 2 m3.fit(lrs/2, 1, metrics=[accuracy])
3 m3.unfreeze()
4 m3.fit(lrs, 1, metrics=[accuracy], cycle_len=1)

~/fastai/courses/dl1/fastai/learner.py in fit(self, lrs, n_cycle, wds, **kwargs)
190 self.sched = None
191 layer_opt = self.get_layer_opt(lrs, wds)
–> 192 self.fit_gen(self.model, self.data, layer_opt, n_cycle, **kwargs)
193
194 def lr_find(self, start_lr=1e-5, end_lr=10, wds=None):

~/fastai/courses/dl1/fastai/learner.py in fit_gen(self, model, data, layer_opt, n_cycle, cycle_len, cycle_mult, cycle_save_name, metrics, callbacks, use_wd_sched, **kwargs)
137 n_epoch = sum_geom(cycle_len if cycle_len else 1, cycle_mult, n_cycle)
138 fit(model, data, n_epoch, layer_opt.opt, self.crit,
–> 139 metrics=metrics, callbacks=callbacks, reg_fn=self.reg_fn, clip=self.clip, **kwargs)
140
141 def get_layer_groups(self): return self.models.get_layer_groups()

~/fastai/courses/dl1/fastai/model.py in fit(model, data, epochs, opt, crit, metrics, callbacks, **kwargs)
92 if stop: return
93
β€”> 94 vals = validate(stepper, data.val_dl, metrics)
95 print(np.round([epoch, debias_loss] + vals, 6))
96 stop=False

~/fastai/courses/dl1/fastai/model.py in validate(stepper, dl, metrics)
104 loss,res = [],[]
105 stepper.reset(False)
–> 106 for (*x,y) in iter(dl):
107 preds,l = stepper.evaluate(VV(x), VV(y))
108 loss.append(to_np(l))

~/fastai/courses/dl1/fastai/dataset.py in next(self)
238 if self.i>=len(self.dl): raise StopIteration
239 self.i+=1
–> 240 return next(self.it)
241
242 @property

~/fastai/courses/dl1/fastai/nlp.py in iter(self)
335 it = iter(self.src)
336 for i in range(len(self)):
–> 337 b = next(it)
338 yield getattr(b, self.x_fld), getattr(b, self.y_fld)
339

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torchtext/data/iterator.py in iter(self)
176 minibatch.sort(key=self.sort_key, reverse=True)
177 yield Batch(minibatch, self.dataset, self.device,
–> 178 self.train)
179 if not self.repeat:
180 raise StopIteration

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torchtext/data/batch.py in init(self, data, dataset, device, train)
20 if field is not None:
21 batch = [x.dict[name] for x in data]
β€”> 22 setattr(self, name, field.process(batch, device=device, train=train))
23
24 @classmethod

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torchtext/data/field.py in process(self, batch, device, train)
182 β€œβ€"
183 padded = self.pad(batch)
–> 184 tensor = self.numericalize(padded, device=device, train=train)
185 return tensor
186

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torchtext/data/field.py in numericalize(self, arr, device, train)
296 arr = self.postprocessing(arr, None, train)
297
–> 298 arr = self.tensor_type(arr)
299 if self.sequential and not self.batch_first:
300 arr.t_()

RuntimeError: given sequence has an invalid size of dimension 2: 0

when I set batch size equal 128, it runs without giving an error (although gives bad results) but when I set batch size to 64 or smaller it gives that error, any ideas?

Is your batch size larger than the number of items in your validation set? I’ve seen an error (I think this one?) that was because my batch size was bigger than my validation set so I just increased the number of images in my validation set to be more than the batch size.

1 Like

The annotated part of my dataset that I want to perform text classification on it is quite small (400 texts) but with this error when I decrease the batch size it happens! When I set the batch size to 128, it doesn’t give the error, but when I decrease the batch size, it gives this error. @jeremy

You might also get this error when you are numericalize unknown vocabulary. I encountered this issue as well - try to retrain your language model with new same vocabulary or filter out text that contains only β€œunseen” words.