Image segmentation with >255 categories using SegmentationDataLoaders

I have a dataset that contains

(1) Original .jpeg images:

(2) .png masks in RGB format:

(just an example not the actual mask for this image)

(3) codes file that links the RGB codes to category labels:

code name
33 36 17 water
71 244 159 pear
40 255 17 egg
189 226 118 grapes
46 239 41 butter
125 199 77 bread-white
116 152 13 jam

my question is how do I use the MaskBlock api to link my codes with the masks? Do I need to create my own MaskBlock definition?

def MaskBlock(codes=None):
    "A `TransformBlock` for segmentation masks, potentially with `codes`"
    return TransformBlock(type_tfms=PILMask.create, item_tfms=AddMaskCodes(codes=codes), batch_tfms=IntToFloatTensor)

and/or define my own AddMaskCodes class?

class AddMaskCodes(Transform):
    "Add the code metadata to a `TensorMask`"
    def __init__(self, codes=None): = codes
        if codes is not None: self.vocab,self.c = codes,len(codes)

    def decodes(self, o:TensorMask):
        if is not None:
        return o

I believe at the moment it uses the order of labels and link this with the pixel value (in case of a single int per pixel). I’ve tried transforming my masks to have single pixel values, but because I have more than 255 categories using won’t work.

Sorry I’m quite new to this and any help would be greatly appreciated!

Can you explain why multiple codes map to the same food item? For instance, it looks the values 33, 36, and 17 all map to “water”.

Hey Patrick,

The codes are RGB values corresponding to the colour of each pixel. The standard fastai approach (e.g. with CamVid dataset) is to turn these into integer values from 1 to n (n=number of categories) and convert the masks to grayscale.

However because my dataset has more than 255 categories (the max for a grayscale png pixel value) I don’t think this is an option for me.

To be honest this is above my level but it is an interesting problem you are trying to solve.

I was thinking if you have to work with uint16 for your problem type since the limiting factor for 256 limit is uint8. I found this on stackoverflow on how to save uint16 . I am not sure if you your PILMask has to be adjusted too. I had a look at the source code it uses uint8. Good luck!

Ah I see. Hmmm…peaking at the codebase a bit, I would suspect that MaskBlock class in might need to be customized first. In particular it returns a set of transforms where the type_tfms = PILMask.create. If you go look at how PILMask is defined you’ll see the 'mode' = 'L' which means unsigned 8-bit integer. The first thing I might try actually is to change that 'mode' = 'I' and see what happens. 'I' is a signed int32 image. See here.

But honestly without having a dataset that’s organized like this and trying to work through the various errors that occur, it’s very hard for a neophyte like myself to be all that helpful.

@Patrick what would be codes in case of binary segmentation…

I have the same issue with RGB values mapped to codes. May I know how you solved it?

I am a beginner learned on the first chapter of the course

I solved a similar problem creating a transform to convert RGB masks to grayscale (the hard limit is always 255 classes though):

You can take a look at it here.