RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 1 and 0 in dimension 1

(Dien Hoa TRUONG) #1

I am trying to create data with PointsItemList for a hand tracking problem. The code seems correct that data.show_batch show exactly what I want.

data = (PointsItemList.from_df(df, path=path, folder='train', cols=['frame'], suffix='.png')
        .random_split_by_pct()
        .label_from_df(cols=['loc'])
        .transform(get_transforms(), tfm_y=True, size=(120,160))
        .databunch().normalize()
       )

However learn.lr_find() or learn.fit() randomly have the error below

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-24-4dfb24161c57> in <module>()
----> 1 learn.fit_one_cycle(1)

~/fastai/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, wd, callbacks, **kwargs)
     20     callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor,
     21                                         pct_start=pct_start, **kwargs))
---> 22     learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
     23 
     24 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, **kwargs:Any):

~/fastai/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    164         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
    165         fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 166             callbacks=self.callbacks+callbacks)
    167 
    168     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~/fastai/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     92     except Exception as e:
     93         exception = e
---> 94         raise e
     95     finally: cb_handler.on_train_end(exception)
     96 

~/fastai/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     80             cb_handler.on_epoch_begin()
     81 
---> 82             for xb,yb in progress_bar(data.train_dl, parent=pbar):
     83                 xb, yb = cb_handler.on_batch_begin(xb, yb)
     84                 loss = loss_batch(model, xb, yb, loss_func, opt, cb_handler)

~/anaconda3/lib/python3.6/site-packages/fastprogress/fastprogress.py in __iter__(self)
     63         self.update(0)
     64         try:
---> 65             for i,o in enumerate(self._gen):
     66                 yield o
     67                 if self.auto_update: self.update(i+1)

~/fastai/fastai/basic_data.py in __iter__(self)
     68     def __iter__(self):
     69         "Process and returns items from `DataLoader`."
---> 70         for b in self.dl:
     71             #y = b[1][0] if is_listy(b[1]) else b[1] # XXX: Why is this line here?
     72             yield self.proc_batch(b)

~/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
    337         if self.rcvd_idx in self.reorder_dict:
    338             batch = self.reorder_dict.pop(self.rcvd_idx)
--> 339             return self._process_next_batch(batch)
    340 
    341         if self.batches_outstanding == 0:

~/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _process_next_batch(self, batch)
    372         self._put_indices()
    373         if isinstance(batch, ExceptionWrapper):
--> 374             raise batch.exc_type(batch.exc_msg)
    375         return batch
    376 

RuntimeError: Traceback (most recent call last):
  File "/home/hoatruong/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 114, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/hoatruong/fastai/fastai/torch_core.py", line 105, in data_collate
    return torch.utils.data.dataloader.default_collate(to_data(batch))
  File "/home/hoatruong/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 198, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/home/hoatruong/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 198, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "/home/hoatruong/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 175, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 1 and 0 in dimension 1 at /opt/conda/conda-bld/pytorch-nightly_1538905146867/work/aten/src/TH/generic/THTensorMoreMath.cpp:1317

I was thinking that the shape of every elements in data are not coherent but it is not.
shape

I found that the error might come from the data_loader when we grab a new batch. But it is quite randomly that the order when I get the error is not the same.

I hope someone can help me to clarify this problem. Thank you so much in advance

1 Like

(Dien Hoa TRUONG) #2

I found that the torch.size for label are sometimes [1,2] sometimes [0,2]. I think it might not come from my data because when I removed these error rows , the problem appears again when I create a new data bunch

0 Likes

#3

Hi there,
The problem comes from the fact that sometimes, your data augmentation will throw the point out of the image. There are three ways of dealing with it:

  • lowering your params of data augmentation
  • writing a collate function that will pad your points when they’re empty to make them the right size
  • using ImagePoints with remove_out=False to keep the points even if they’re out (and making the model guess from the part of the picture it can see).
4 Likes

(Dien Hoa TRUONG) #4

Thank you so much for your help. I made it works with lowering the params for data augmentation, I will try to implement your others suggestions too.

Bonnes fêtes !!!

1 Like

#5

I am a beginner fastai user so pardon my trivial question. I am trying to use unet_learner to create image segmentation of the different sewer conditions as shown in my gist here.

I have the same error but I am not sure if it is relevant; the full error is

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 375 and 376 in dimension 2 at /opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/THC/generic/THCTensorMath.cu:83

I am confused which tensor sizes it is referring to - is it between the mask and the original image itself or the plotted segmentations? Better yet, how would I go about troubleshooting this on my own.

Hope this makes sense.

0 Likes

(RobG) #6

This is probably because your image sizes are odd. ie 375x500. You need to make sure they are even, or better still a multiple of 32. Unfortunately unet_learner doesn’t throw a useful warning.

4 Likes

#7

Hi

I changed size=128 but now am running with this other problem that I keep seeing RuntimeError: CUDA error: device-side assert triggered. Restarting the kernel a few times did not help - could it be a problem with the metrics ?

1 Like

(RobG) #8

Possibly need size to be a tuple? You’re resizing your 750/1000 images as 375/500. I’d just set it as 384/512 and see if that works. I’m a lazy programmer. From size = src_size//2 to size = src_size*.512

0 Likes

#9

Yeah I get the same error when I change the size=128 or size=src_size*.512* as you have proposed.

What else could be the reason for the CUDA memory problem, not sure how to troubleshoot this :frowning:

1 Like

(Evan) #10

I am also getting a similar error. From reading through the forums I saw some suggestion it could be due to the mask tensor containing 0 and 255 values instead of 0 and 1 (which is the case with my example. Using fastai for Segmentation, receiving a CUDA device-side assertion error

Although the suggestion to set div=True doesn’t help me much, as I’m not sure what way to change it. The advice here didn’t work ImageMask.data created by open_mask returns all zeros as SegmentationItemList does not have attribute set_attr()

2 Likes

#11

Now the current method is to subclass SegmentationLabelList and its open function (return open_mask(fn, div=True)). Then pass your new class with label_cls when you label your data using the data block API.

2 Likes

(Evan) #12

Thanks. Sorry to be asking something that is possibly obvious, but where exactly do you pass the label_cls - I can’t find any examples, I’ve read through the documentation and I’m not sure if what I’m doing is working - I can get the data to show batches of images, but anytime I try the learning rate finder I get the ame CUDA assertion error, so I’m assuming I’m still doing something wrong. I’m passing it in the SegmentationDataList, e.g.

src = (SegmentationItemList.from_folder(PATH_PNG, label_cls = SegmentationLabelList2)…

0 Likes

#13

It’s when you label that you want to pass label_cls (the function label_from_something).

1 Like

#14

I’m so sorry I still dont understand this …

I first define the path to my images, split them and label them using a function.

src = (SegmentationItemList.from_folder(path_img)
.random_split_by_pct()
.label_from_func(get_y_fn, classes=codes))

src gives

LabelLists;

Train: LabelList
y: SegmentationLabelList (495 items)
[ImageSegment (1, 3024, 4032), ImageSegment (1, 750, 1000), ImageSegment (1, 780, 1040), ImageSegment (1, 768, 1024), ImageSegment (1, 3024, 4032)]…
Path: /home/jupyter/.fastai/data/Longkang/images
x: SegmentationItemList (495 items)
[Image (3, 3024, 4032), Image (3, 750, 1000), Image (3, 780, 1040), Image (3, 768, 1024), Image (3, 3024, 4032)]…
Path: /home/jupyter/.fastai/data/Longkang/images;

Valid: LabelList
y: SegmentationLabelList (123 items)
[ImageSegment (1, 768, 1024), ImageSegment (1, 3024, 4032), ImageSegment (1, 3024, 4032), ImageSegment (1, 750, 1000), ImageSegment (1, 3024, 4032)]…
Path: /home/jupyter/.fastai/data/Longkang/images
x: SegmentationItemList (123 items)
[Image (3, 768, 1024), Image (3, 3024, 4032), Image (3, 3024, 4032), Image (3, 750, 1000), Image (3, 3024, 4032)]…
Path: /home/jupyter/.fastai/data/Longkang/images;

Test: None

In my datablock API I run

data = (src.transform(get_transforms(), size=size, tfm_y=True)
.databunch(bs=bs)
.normalize(imagenet_stats))

data.show_batch works with mask overlay, but the learning part receives CUDA error.

Isn’t passing classes=code equivalent to passing label_cls=MySegList where I was labelling? Or do I have to pass label_cls again (somehow?) in datablock API ?

A short example would help a lot!

1 Like

(Thomas) #15

I am exactly there, with the same problem.

0 Likes

(Mohamed Ayman Elshazly) #16

Did u manage to solve it ?

0 Likes

#17

just wanted to mention that this little hint really helped :slight_smile:

0 Likes