UNet: Size error for a custom dataset

Hi fellow FastAI users !

I’m working on the PenFundanDataset for image segmentation. I’ve created a custom pytorch dataset because I find that easier than the data block api, at least for now.

Here’s my dataset:

class SemSegDataset(Dataset): 

    def __init__(self, path): 

        self.path = path 
        self.nb_elements = int(len(glob.glob(path + '/*.png'))/2)
        self.c = 3

    def __getitem__(self, idx): 
        im = np.array(Image.open(self.path + '/im{}.png'.format(idx)))
        msk = np.array(Image.open(self.path + '/msk{}.png'.format(idx)))

        im_t = self.apply_transform(im)
        msk_t = self.apply_transform(msk, is_x = False)

        return im_t, msk_t.unsqueeze(0) 
    def __len__(self): 
        return self.nb_elements

I’ve written a few sanity checks that return satisfying results:

s_dataset = SemSegDataset('./DebugDataset')
x,y = s_dataset[0]
print(x.shape, y.shape) # torch.Size([3, 224, 224]) torch.Size([1, 224, 224])

Creating the databunch is fine, but the learner returns an error saying that the dummy batch requires an iterable instead of a built-in function :

data = DataBunch.create(s_dataset, s_dataset, bs = 2, num_workers = 1)
learn = unet_learner(data, models.resnet34, metrics=metrics, wd=1e-2)

I tracked the error to the and in the fastai/vision/learner.py file, I changed the following line from this size = data.train_ds[0][0].size to this size = data.train_ds[0][0].size(). Now, in this case, I get an error saying:

Expected 4-dimensional input for 4-dimensional weight 64 3 7 7 0, but got 5-dimensional input of size [1, 3, 3, 224, 224]

Do you guys know what’s causing this ? Any tips or advices regarding creating a custom dataset in fastai ?

Thanks a lot !

1 Like

The first error you got comes from the fact that fastai uses a size property for the Image class, so your patch is good and does basically the same thing.
For the second one, it is a bit strange, what does next(iter(data.train_dl))[0].shape yield ? One thing that may be related is that you write self.c=3 while your mask only has one channel (so one class).

As a side note, I was reluctant to use the data block API in the first place but once you start understanding how it works it is actually very well done. For your problem, you could do something like:

data = (SegmentationItemList.
        from_folder('./DebugDataset', extensions=['.png']).
        filter_by_func(lambda fn: 'im' in fn.name).
        split_none().  #use something else like split_by_rand_pct if you want to have a validation set
        label_from_func(lambda fn: fn.parent/fn.name.replace('im', 'msk')).
learn = unet_learner(data, models.resnet34, metrics=metrics, wd=1e-2)
1 Like

Hey ! Thanks for the smart and detailed response! Appreciate it !
Indeed, the fact that there’s only one channel for labelling might be in conflict with the self.c. I’m gonna check that right now :wink:
Yeah, I suppose it is only a matter of practice to get use to the pipeline. I haven’t had so much success with it lately, but I’m planning on tackling this asap !

Thanks again, my friend !

1 Like

I added a MaskToChannels transform to fix the problem mentionned above. Here’s the code:

class MaskToChannels: 

    def __call__(self, x): 

        x_cp = np.copy(x)
        _max = np.max(x_cp) + 1

        channels = np.zeros((*x_cp.shape, _max))
        for i in range(_max): 
            channels[:,:,i] = np.where(x_cp == i, np.ones_like(x_cp), np.zeros_like(x_cp))

        #  ==================== To see masks ==================== 
        # ==================== ==================== ==================== 
        # f, ax = plt.subplots(1,4)
        # ax[0].matshow(x_cp)
        # for i in range(1,4):
        #     ax[i].matshow(channels[:,:,i-1])

        # plt.pause(0.1)
        # input()

        # ==================== ==================== ==================== 
        return channels

Using this in the transform pipeline for the labels like this tfms = [MaskToChannels(), Rescale(), ToTensor()] yields the expeceted shape [3,224,224].
However, to allow learn = unet_learner(data, models.resnet34, metrics=metrics, wd=1e-2) to complete without error, I had to modify the dummy_batch method in the fastai/callbacks/hooks.py file as follows:

out = one_param(m).new(1, ch_in, *size).requires_grad_(False).uniform_(-1.,1.)

if len(out.shape) > 4: 
    out = one_param(m).new(1,*size).requires_grad_(False).uniform_(-1.,1.)
return out 

Now, learn = unet_learner(data, models.resnet34, metrics=metrics, wd=1e-2) completes successfully, but on calling learn.lr_find(), I get the following error:

fastai/vision/models/unet.py, line 67, in <lambda>
    if imsize != x.shape[-2:]: layers.append(Lambda(lambda x: F.interpolate(x, imsize, mode='nearest')))

 line 2485, in interpolate
    return torch._C._nn.upsample_nearest2d(input, _output_size(2))
RuntimeError: It is expected output_size equals to 2, but got size 3

I tracked the error to the loss_batch method in fastai/basic_train.py and the input image and labels do have the correct shape, so I’m not sure where the error comes from.

I might end up subclassing the fastAI Image ItemList, but I’d really like to know what bothers the pipeline currently. Any idea ?

Thanks a lot :wink:

1 Like

I think your original problem comes from this line from fastai.vision.learner.unet_learner:

try:    size = data.train_ds[0][0].size
except: size = next(iter(data.train_dl))[0].shape[-2:]

With the fastai API, size has length 2 by design (H\times W), but without it and the modification you did (which I wrongly said was correct, my bad) to put size = data.train_ds[0][0].size() in its place, it has length 3 (C\times H \times W). A patch could be to replace it by size = data.train_ds[0][0].shape[-2:], but a better idea in my opinion is to use fastai’s Image class that wraps the tensors, has the size attribute and is designed to work well with unet_learner. So to me, your __getitem__ should return Image(im_t), ImageSegment(msk_t.unsqueeze(0)). After thinking about it a bit more, I also don’t think you should be needing to use your MaskToChannels class, as long as you pass the right loss function which is in this case CrossEntropyFlat(axis=1) from fastai. It doesn’t expect a one-hot encoded ground truth mask, so it should work fine even with one channel.

  • Make your dataset return an Image and an ImageSegment (note that both need float tensors, with values between 0 and 1 for the image and integer values - but stored as floats - for the mask, to be passed)
  • Undo your size = data.train_ds[0][0].size() change
  • Don’t use your MaskToChannels transforms, it is not needed in theory contrary to what I made you believe
  • Pass loss_func=CrossEntropyFlat(axis=1) to unet_learner (you can alternatively store self.loss_func=CrossEntropyFlat(axis=1) in your dataset class).
  • Also, for your metrics, make sure they take into account that your model outputs a one-hot encoded mask (with 3 channels in your case then), while the ground truth mask has one channel. Usually, the solution is to convert the predicted mask using mask.argmax(dim=1) (assuming the mask has shape B\times C\times H \times W)
1 Like

Thanks to your advices, I’m moving forward, but I’m still facing some pipeline problems. Specifically, a PyTorch module seems to be receiving a tensor with an unexpected shape. The errors appears in learn.fit(n) but I’m completing the learn.lr_find() successfully.

Here’s what I’m doing:

class SemSegDataset(Dataset): 

    def __init__(self, path): 

        self.path = path 
        self.nb_elements = int(len(glob.glob(path + '/*.png'))/2)
        self.c = 3
        self.loss_func = CrossEntropyFlat(axis = 1)

    def __getitem__(self, idx): 
        im = np.array(pImage.open(self.path + '/im{}.png'.format(idx)))
        msk = np.array(pImage.open(self.path + '/msk{}.png'.format(idx)))

        im_t = self.apply_transform(im)
        msk_t = self.apply_transform(msk, is_x = False)

        return Image(im_t), ImageSegment(msk_t.unsqueeze(0))

    def apply_transform(self, sample, is_x = True): 

        if is_x: 
            tfms = [Rescale(), ToTensor()]
            tfms = [Rescale(), ToTensor()]

        for tfm in tfms: 
            sample = tfm(sample)

        return sample 

If the mask tensor has shape [B, 1,224,224] I get the following error:

Runtime Error: invalid argument 3: only batches of spatial targets supported (3D tensors) but got targets of dimension 4

Strangely, I’m also able to open the mask and the image using fastai method.
What do you think ?

It’s quite difficult to assess what’s happening here without running some tests. Let’s compare some things, what are the outputs of:

  • learn.data.train_ds[0][1].shape
  • learn.data.one_batch()[1].shape
  • If you can use a debugger and check what is the shape of the tensor that causes the runtime error, that would be great too. If you use a notebook, you can run %debug in a cell after the error is raised to access the debugger. If you have a problem using it, I might help you navigating it but I’d need the full stack trace for that.
    What’s really strange is that you successfully run lr_find, which does run learn.fit. When does the error occur ? At the beginning ? At the end ? Randomly at the middle ?
1 Like
  • Running learn.data.train_ds[0][1].shape returns ([1, 224, 224])
  • Running learn.data.one_batch()[1].shape returns ([3, 224, 224])
  • Further, I tracked the error in the fit method of basic_train.py. Asking for (xb.shape, yb.shape) returns xb shape: ([2, 3, 224, 224]) yb shape: ([2, 1, 224, 224])

However, I made a mistake when I said that I was able to run lr_find succesfully. The program exits in this line.

That looks ok to me, even though I’m not sure why your label goes from one channel to 3 when using one_batch. That may be causing the problem later but I don’t see why ? Do you have a full stack trace of the error to share ?

Absolutely, I do.

train_ds[0][1].shape is torch.Size([1, 224, 224])
one_batch()[1].shape is torch.Size([3, 224, 224])
 === Running lr finder ===
epoch     train_loss  valid_loss  acc_camvid  time    
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
Traceback (most recent call last):
  File "semantic_fastai.py", line 127, in <module>
  File "/home/mehdi/miniconda3/envs/fai/lib/python3.6/site-packages/fastai/train.py", line 32, in lr_find
    learn.fit(epochs, start_lr, callbacks=[cb], wd=wd)
  File "/home/mehdi/miniconda3/envs/fai/lib/python3.6/site-packages/fastai/basic_train.py", line 212, in fit
    fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
  File "/home/mehdi/miniconda3/envs/fai/lib/python3.6/site-packages/fastai/basic_train.py", line 111, in fit
    loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)
  File "/home/mehdi/miniconda3/envs/fai/lib/python3.6/site-packages/fastai/basic_train.py", line 39, in loss_batch
    loss = loss_func(out, *yb)
  File "/home/mehdi/miniconda3/envs/fai/lib/python3.6/site-packages/torch/nn/functional.py", line 1827, in nll_loss
    ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: invalid argument 3: only batches of spatial targets supported (3D tensors) but got targets of dimension: 4 at /opt/conda/conda-bld/pytorch_1565287148058/work/aten/src/THNN/generic/SpatialClassNLLCriterion.c:61

Ok so I guess we need to check what are out.shape and yb.shape in

  File "/home/mehdi/miniconda3/envs/fai/lib/python3.6/site-packages/fastai/basic_train.py", line 39, in loss_batch
    loss = loss_func(out, *yb)

Can you do this ? It is highly possible the problem comes from out, which would mean that your model outputs 5D tensors (with batch as first dimension) for some reason. You can by the way check learn.model(learn.data.one_batch()[0]).shape, that should output the same thing as out.shape.

For the following modification:

    out = model(*xb)
    for yy in yb: 
    print('Out shape before handler: {}'.format(out.shape))
    out = cb_handler.on_loss_begin(out)
    print('Out shape after handler: {}'.format(out.shape))

I get:

torch.Size([2, 1, 224, 224])
Out shape before handler: torch.Size([2, 3, 224, 224])
Out shape after handler: torch.Size([2, 3, 224, 224])

Also, running learn.model(learn.data.one_batch()[0]).shape returns the following shape: torch.Size([2, 3, 224, 224]) which seems coherent.

Indeed, it looks like it is completely working as intended. At this point I guess we’ll need to check what’s happening in pytorch. Can you check what are input.shape and target.shape in nn.functional.cross_entropy and nn.functional.nll_loss ? It will enable us to see if problem comes from fastai (probably in FlattenedLoss for some reason) or somewhere in pytorch.

What’s very strange is that inputs and targets are supposed to be flattened by fastai, therefore having at most 2 dimensions. So there is no particular reason it shouldn’t work imo (I’m using a unet_learner with this loss myself, works perfectly).

That might be it !
Running print('Input shape: {} target shape: {}'.format(input.shape, target.shape)) in nn.functional.nll_loss returns Input shape: torch.Size([2, 3, 224, 224]) target shape: torch.Size([2, 1, 224, 224]) .Target shape should have 3 channels, right ?

Hmm, indeed, I’m used to PyTorch and I’m still getting used to fastai’s internal which is why I’m getting lost in the pipeline. :wink:

No target is supposed to have 1 channel, it has in theory integer values between 0 and 2 (included) in your case. Is this the case ?

It is indeed the case. For the max, min and dtype I get: Max target: 2 Min target: 0 Target dtype: torch.int64

Looks fine to me, I’ll check some things on my notebook seems if there’s something different for me.

Thanks a lot ! Meanwhile, I’ll try to trace the error again, see if I missed something !

So after checking, for me input and target have flattened losses in nll_loss, which is not the case for you, so there is probably something going wrong somewhere in fastai as I was suspecting seeing the stack trace. There is indeed no call between loss = loss_func(out, *yb) and ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index) which is strange, there should be at elast 3 intermediate calls. I don’t know why it happens though, so we’ll need to go a bit blindly. Can you try to pass loss_func=CrossEntropyFlat(axis=1) to unet_learner ? Maybe the one given in the dataset is not taken into account for some reason.

Nice ! Indeed, it seems giving the loss function through the dataset wasn’t acknowledged by the learner.

print(learn.loss_function) # nll_loss
learn.loss_func = CrossEntropyFlat(axis = 1)
print(learn.loss_function) # FlattenedLoss of CrossEntropyLoss()

The lr finder seems to finish the first batch before returning the following error: File "/home/mehdi/miniconda3/envs/fai/lib/python3.6/site-packages/torch/nn/functional.py", line 1824, in nll_loss ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: Assertioncur_target >= 0 && cur_target < n_classes’ failed. at /opt/conda/conda-bld/pytorch_1565287148058/work/aten/src/THNN/generic/ClassNLLCriterion.c:94`

Tracing the error indicates that this time, the dimensions’s length is 2, which is a progress. :wink:

1 Like