Creating an Image Segmentation Dataset for Multiple Classes

I am using the fastai library to segment images with multiple classes on a personal dataset. Consequently, I have multiple masks per image. What is the best way to proceed? Do I consolidate the masks into a single image where the background is 0, and each subsequent class is assigned an integer (1, 2, 3 etc…)? Or do I extend the SegmentationDataset class to become more like ImageMultiDataset so that it can handle multiple masks? All the examples of the V1 library that I can find of segmentation are of single class segmentation datasets. Thank you in advance!


The first solution is the best if the masks don’t overlap. There is an example with camvid of a segmentation problem with multiple classes.
If one pixel can belong to several classes though, you should have multiple channels in your mask.


Thank you for your prompt reply! For people who encounter a similar issue, the camvid lesson is linked below.

1 Like

I chose to focus on implementing a model that can segment one feature first before adding additional channels to handle overlapping features. When I call ImageBunch instance functions like show_batch() and normalize() on an ImageBunch object , I get an error along the lines of AttributeError: 'SegmentationDatasetOneFeature' object has no attribute 'normalize.' Why are the functions being called on the extended Dataset instead of the ImageBunch class?

class SegmentationDatasetOneFeature(SegmentationDataset):

def _get_y(self,i): 
    #checks if mask exists
    image_path = glob.glob(self.y[i] + "/abc.*")
    #if there is no abc, load an empty one 
    if len(image_path) == 0:
        tensor = torch.tensor((),
        tensor.new_zeros((sz, sz))
        return ImageSegment(Image(tensor))
        return open_mask(image_path[0])

data = DataBunch.create(train_ds=train_ds, valid_ds=valid_ds, test_ds=None,
bs=bs, num_workers=nw, tfms=tfms)

data = data.normalize(imagenet_stats)

AttributeError                            Traceback (most recent call last)
<ipython-input-30-f381a0614dd5> in <module>
----> 1 data = data.normalize(imagenet_stats)

~/anaconda3/envs/fastai-1.0/lib/python3.7/site-packages/fastai/ in __getattr__(self, k)
    114         return cls(*dls, path=path, device=device, tfms=tfms, collate_fn=collate_fn)
--> 116     def __getattr__(self,k:int)->Any: return getattr(self.train_dl, k)
    117     def holdout(self, is_test:bool=False)->DeviceDataLoader:
    118         "Returns correct holdout `Dataset` for test vs validation (`is_test`)."

~/anaconda3/envs/fastai-1.0/lib/python3.7/site-packages/fastai/ in __getattr__(self, k)
     50     def __len__(self)->int: return len(self.dl)
---> 51     def __getattr__(self,k:str)->Any: return getattr(self.dl, k)
     53     @property

~/anaconda3/envs/fastai-1.0/lib/python3.7/site-packages/fastai/ in DataLoader___getattr__(dl, k)
     34         super().__init__(len(classes))
---> 36 def DataLoader___getattr__(dl, k:str)->Any: return getattr(dl.dataset, k)
     37 DataLoader.__getattr__ = DataLoader___getattr__

AttributeError: 'SegmentationDatasetOneFeature' object has no attribute 'normalize'

That’s because DataBunch doesn’t have a normliaze function (so it’s trying to find it in its train_ds). You have to use an ImageDataBunch for this.

1 Like

@Jamie which tool did you use to create the masks for the raw images?

Can you clarify what you mean by ‘raw images?’

raw images are input features without any pre-processing

In case you are looking for an annotation tool , try Labelme

For a dataset that has multiple masks for a single image, is it also possible (and also is it recommended?) to create multiple training instances of that single image, paired with each image mask, rather than say combining all of the masks into one single mask where there are no overlaps? I guess this would increase training time? But would this improve or degrade accuracy?


Also to add to the above, is there a way in fastai to combine these masks into one image in some sort of preprocessing step?

1 Like

How do you recommend I proceed if we have masks with multiple channels (due to overlapping)? I currently have outputs as CHW tensors, but those don’t seem to work nicely with PIL if I use ImageSegment. Should I have multiple ImageSegment labels (1 per channel) for each input? Thank you.

You should use and ImageSegment with CHW channels. That probably requires a custom open method, so you will have to subclass ImageSegment but then the transforms should be applied properly to the targets.

1 Like

Does that also mean I have to create custom class for SegmentationLabelList and SegmentationItemList if I need a CHW ImageSegment?

I have a multiclass image segmentation problem where an individual pixel can belong to any of 4 possible classes. This ends up being convenient as I can actually store all the mask info within an RGBA .png.

I have created a Unet Learner and dataset in which

  • A batch (xb) is of size [2, 3, 350, 525]
  • The labels are of size [2, 4, 350, 525] (1 for positive match, 0 for background)
  • The output of the model is of size [2, 4, 350, 525]

I am having problems training my model and it appears to fail while calculating the loss for a batch.

FlattenedLoss.__call__ is defined below:

Line 242 checks whether or not we are working with 2D input (we are) and resizes from the original [2, 4, 350, 525] to [367500, 4].

However, on line 243 it resizes target from [2, 4, 350, 525] to [1470000,]. This triggers an exception within Torch due to the mismatched sizes.

ValueError: Expected input batch_size (367500) to match target batch_size (1470000).

I tried resizing both to [1470000,] but received

~/.local/lib/python3.6/site-packages/torch/nn/ in log_softmax(input, dim, _stacklevel, dtype)
   1348         dim = _get_softmax_dim('log_softmax', input.dim(), _stacklevel)
   1349     if dtype is None:
-> 1350         ret = input.log_softmax(dim)
   1351     else:
   1352         ret = input.log_softmax(dim, dtype=dtype)

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

I tried also resizing both to [367500, 4] but received

~/.local/lib/python3.6/site-packages/torch/nn/ in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   1869                          .format(input.size(0), target.size(0)))
   1870     if dim == 2:
-> 1871         ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
   1872     elif dim == 4:
   1873         ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

RuntimeError: multi-target not supported at /pytorch/aten/src/THCUNN/generic/

At this point I’m mostly blindly guessing at what I should do instead of thinking about it.

It feels like I should:

  1. Take sigmoid of each output pixel in the volume so as to limit each to be between 0 and 1.
  2. Flatten the output and target into (1470000,)
  3. Compute cross-entropy between the output and target.

Does this sound correct? If so, what’s the best way to work this custom functionality into FastAI? Do I pass in a custom loss_func when creating my learner?


You are using the wrong loss function: if you have a multiclassification problem, you should use
BCEWithLogitsFlat (not sure of the name but you can find the right one easily).


Perfect, this is exactly what I needed to make it all work!

For future visitors, I created my learner as:

learn = unet_learner(data, models.resnet18, loss_func=BCEWithLogitsFlat())

I notice that the last post here is from a few months ago so I’m wondering if perhaps anyone has created a working example of using or creating multi-class data sets for > 4 channels in the meantime?
It would be super helpful to see a working implementation.
Thanks in advance.

1 Like