I have a dataset that contains
(1) Original .jpeg images:
(2) .png masks in RGB format:
(just an example not the actual mask for this image)
(3) codes file that links the RGB codes to category labels:
|33 36 17
|71 244 159
|40 255 17
|189 226 118
|46 239 41
|125 199 77
|116 152 13
my question is how do I use the MaskBlock api to link my codes with the masks? Do I need to create my own MaskBlock definition?
"A `TransformBlock` for segmentation masks, potentially with `codes`"
return TransformBlock(type_tfms=PILMask.create, item_tfms=AddMaskCodes(codes=codes), batch_tfms=IntToFloatTensor)
and/or define my own AddMaskCodes class?
"Add the code metadata to a `TensorMask`"
def __init__(self, codes=None):
self.codes = codes
if codes is not None: self.vocab,self.c = codes,len(codes)
def decodes(self, o:TensorMask):
if self.codes is not None: o.codes=self.codes
I believe at the moment it uses the order of labels and link this with the pixel value (in case of a single int per pixel). I’ve tried transforming my masks to have single pixel values, but because I have more than 255 categories using
pil.save(Image.fromarray(arr)) won’t work.
Sorry I’m quite new to this and any help would be greatly appreciated!
Can you explain why multiple codes map to the same food item? For instance, it looks the values 33, 36, and 17 all map to “water”.
The codes are RGB values corresponding to the colour of each pixel. The standard fastai approach (e.g. with CamVid dataset) is to turn these into integer values from 1 to n (n=number of categories) and convert the masks to grayscale.
However because my dataset has more than 255 categories (the max for a grayscale png pixel value) I don’t think this is an option for me.
To be honest this is above my level but it is an interesting problem you are trying to solve.
I was thinking if you have to work with
uint16 for your problem type since the limiting factor for 256 limit is
uint8. I found this on stackoverflow on how to save
uint16 . I am not sure if you your
PILMask has to be adjusted too. I had a look at the source code it uses
uint8. Good luck!
Ah I see. Hmmm…peaking at the codebase a bit, I would suspect that
MaskBlock class in fastai.vision.data might need to be customized first. In particular it returns a set of transforms where the
type_tfms = PILMask.create. If you go look at how
PILMask is defined fastai.vision.core you’ll see the
'mode' = 'L' which means unsigned 8-bit integer. The first thing I might try actually is to change that
'mode' = 'I' and see what happens.
'I' is a signed int32 image. See here.
But honestly without having a dataset that’s organized like this and trying to work through the various errors that occur, it’s very hard for a neophyte like myself to be all that helpful.
@Patrick what would be codes in case of binary segmentation…
I have the same issue with RGB values mapped to codes. May I know how you solved it?
I am a beginner learned on the first chapter of the course
I solved a similar problem creating a transform to convert RGB masks to grayscale (the hard limit is always 255 classes though):
You can take a look at it here.