(just an example not the actual mask for this image)
(3) codes file that links the RGB codes to category labels:
code
name
33 36 17
water
71 244 159
pear
40 255 17
egg
189 226 118
grapes
46 239 41
butter
125 199 77
bread-white
116 152 13
jam
my question is how do I use the MaskBlock api to link my codes with the masks? Do I need to create my own MaskBlock definition?
def MaskBlock(codes=None):
"A `TransformBlock` for segmentation masks, potentially with `codes`"
return TransformBlock(type_tfms=PILMask.create, item_tfms=AddMaskCodes(codes=codes), batch_tfms=IntToFloatTensor)
and/or define my own AddMaskCodes class?
class AddMaskCodes(Transform):
"Add the code metadata to a `TensorMask`"
def __init__(self, codes=None):
self.codes = codes
if codes is not None: self.vocab,self.c = codes,len(codes)
def decodes(self, o:TensorMask):
if self.codes is not None: o.codes=self.codes
return o
I believe at the moment it uses the order of labels and link this with the pixel value (in case of a single int per pixel). I’ve tried transforming my masks to have single pixel values, but because I have more than 255 categories using pil.save(Image.fromarray(arr)) won’t work.
Sorry I’m quite new to this and any help would be greatly appreciated!
The codes are RGB values corresponding to the colour of each pixel. The standard fastai approach (e.g. with CamVid dataset) is to turn these into integer values from 1 to n (n=number of categories) and convert the masks to grayscale.
However because my dataset has more than 255 categories (the max for a grayscale png pixel value) I don’t think this is an option for me.
To be honest this is above my level but it is an interesting problem you are trying to solve.
I was thinking if you have to work with uint16 for your problem type since the limiting factor for 256 limit is uint8. I found this on stackoverflow on how to save uint16 . I am not sure if you your PILMask has to be adjusted too. I had a look at the source code it uses uint8. Good luck!
Ah I see. Hmmm…peaking at the codebase a bit, I would suspect that MaskBlock class in fastai.vision.data might need to be customized first. In particular it returns a set of transforms where the type_tfms = PILMask.create. If you go look at how PILMask is defined fastai.vision.core you’ll see the 'mode' = 'L' which means unsigned 8-bit integer. The first thing I might try actually is to change that 'mode' = 'I' and see what happens. 'I' is a signed int32 image. See here.
But honestly without having a dataset that’s organized like this and trying to work through the various errors that occur, it’s very hard for a neophyte like myself to be all that helpful.