How does PILMask.create work?

How does PILMask.create work to generate a mxn mask from a mxnx3 image file? I tried to go through the source code and figure it out but parsing out became bit of a challenge for a novice like me. I would really appreciate if someone could provide a simple explanation. Thanks!

I think the answer is in line 125 of core.py. It opens any image with mode L, so it just converts an RGB image into a greyscale image by storing the luminance as the only channel.

Thank you for your response. Could you also explain how it relates the luminance values to the codes (the segmentation classes) ?

Good question. As far as I know, the PILMask itself doesn’t. I just tried generating a PILMask from a Camvid label png:

np.unique(PILMask.create(PATH/'0001TP_009210_L.png'))

>>> array([  0,  22,  34,  38,  45,  57,  60,  90,  97, 113, 116, 121, 128,
       132, 185], dtype=uint8)

So these are not in a nice format for the segmentation classes. Have a look at this notebook by Zach Mueller, particularly at the n_codes and get_msk functions. You basically loop through the unique values and map them to integers starting from 0.

When you use the high-level functions (SegmentationDataLoaders), I think this is done automatically for you.

Some PNG format lable image is saved in mode ‘P’ so also needs to be open in mode ‘P’ rather than ‘L’, which is the default behavior of MaskBlock or PIL. create. The problem is, when opening a mask in ‘L’ mode, the label is not in desired range from [0, n_classes] which is what fastai expected, e.g. when you open a PASCAL VOC 2012 label with PILMask.crete

fn = "/home/ryf_stu01/fastai/downloads/VOC2012/VOCdevkit/VOC2012/SegmentationClass/2010_001951.png"
np.unique(PILMask.create(fn))

you get:

array([  0, 147, 150, 220], dtype=uint8)

which is no good as a label-ish thing for segmentation with 21 classes.
one short way to fix it is :

class MyPILMask(PILBase): _open_args,_show_args = {'mode':'P'},{'alpha':0.5, 'cmap':'tab20'}

then run the code again you will get:

[in]    np.unique(PILMask.create(fn))
[out]:  array([  0,  15,  19, 255], dtype=uint8)

This time the results look as we expected because voc 2012 has 21 classes and 255 represents void(or empty)
Then when creating Datablock, just do like this:


PILMask._open_args = {'mode':'P'}

voc2012 = DataBlock(blocks    = (ImageBlock, MaskBlock(codes=codes)),
                    get_items = get_trainval_fanme,
                    get_y     = get_label,
                    item_tfms = Resize(224),
                    batch_tfms= aug_transforms())

This works because MaskBlock rely on PILMask.creat and PILMask.creat itself relys on _open_args which contains open mode. so we change it before it access it. Hope this helps :smiley:

Thanks for update and quick reply. I’ll be sure to keep an eye on this thread. looking for the same content.

kabosu coin

Hi all,
I have a dataset of .png images with pixels coded in P-mode.
As suggested above in this topic, I tried to use PILMask._open_args = {'mode':'P'} before the DataBlock(...), but it does not seem to work.
In my case, pixels values got by np.unique(PILMask.create(fn)) are codified correctly.
Original pixels values are [0, 1, 2], instead I get [0, 1, 255]