Image segmentation - how to merge multiple masks

Unet / ResNet, training a model for image segmentation. This is the dataloader I have now:

def src_get_label(fn):
    label_name = src_dataset / "GT" / f"{fn.stem}_GT.png"
    return label_name

src_datablock = DataBlock(
    blocks=(ImageBlock, MaskBlock),
    item_tfms=Resize(size=input_image_size, method="squish"),

src_dataloader = src_datablock.dataloaders(
    src_dataset / "original", path=src_dataset, bs=src_batch_size

In most cases, each image file in the dataset has exactly one mask file. In some cases, an image file will have multiple mask files, each mask highlighting a different object in the image. I can tell by the filenames which masks are assigned to each image - that’s not the problem.

The problem is - how do I merge multiple mask files into a single mask structure in the dataloader? I can identify the mask files just fine, I just don’t know where to begin to merge them.

The masks have a single category, so in principle merging could be as simple as generating the union of the regions of interest. I could do that myself, if I had access to the mask structures as arrays or something - that’s the part I don’t know.

(I’m still pretty new to FastAI.)

You could do this independent of all the data loading stuff. Ahead of creating the DataBlock loop through all the instances, merge those masks belonging to one instance, save the resulting masks as src_dataset/"GT"/ f"{fn.stem}_GT_merged.png" say, use the new masks in your src_get_label function.
If you want to do it on the fly, you could create a custom TransformBlock which mimics the behavior of MaskBlock but adds an extra step that merges the masks before it creates the PIL.Mask. If you opt for the second method I can look further into this.

You might find some code from the following repo relevant: severstal-steel-defects-detection/00_data-prep.ipynb at main · bilalcodehub/severstal-steel-defects-detection · GitHub. Check the code in the section titled: Generate the labels file and masks for each training image.

Briefly speaking, we converted the run length encodings (RLE) representing masked label pixels for each image and transorm them into single mask at the preprocessing stage. This is then fed into the fastai datablock system for subsequent modelling.

Hope this might help.

I can process mask files offline, or outside of the FastAI code blocks, just fine. That’s pretty easy, and could be done in Python, or via tools like ImageMagick, etc.

My question was more about the cases where various constraints on storage may strongly suggest doing all the mask processing on the fly.

tl;dr: the result of get_y should be a numpy array of your merged masks.

Great :slight_smile: I mentioned TransformBlock in my first post, which is still worthwile to look into, but I noticed that it might be simpler to just use get_y.
…you said you were new to fastai so I will very quickly go through the (relevant) logic of the DataBlock:

  • you pass a path that holds your images to get_items which creates a list of image files
  • each of those list items is passed to get_x and get_y
  • the result of get_x is passed to blocks[0] (here: ImageBlock)
  • the result of get_y is passed to blocks[1] (here: MaskBlock)
  • all the rest is applied which order I’m not shure about right now, but we don’t need it.

Now, if you check MaskBlock??

def MaskBlock(
    codes:list=None # Vocab labels for segmentation masks
    "A `TransformBlock` for segmentation masks, potentially with `codes`"
    return TransformBlock(type_tfms=PILMask.create, item_tfms=AddMaskCodes(codes=codes), batch_tfms=IntToFloatTensor)

you can see that a PILMask is created, which is:

class PILMask(PILBase): _open_args,_show_args = {'mode':'L'},{'alpha':0.5, 'cmap':'tab20'}

which inherits PILBase

class PILBase(Image.Image, metaclass=BypassNewMeta):
    ### irrelevant stuff
    def create(cls, fn:Path|str|Tensor|ndarray|bytes, **kwargs)->None:
        "Open an `Image` from path `fn`"
        if isinstance(fn,TensorImage): fn = fn.permute(1,2,0).type(torch.uint8)
        if isinstance(fn, TensorMask): fn = fn.type(torch.uint8)
        if isinstance(fn,Tensor): fn = fn.numpy()
        if isinstance(fn,ndarray): return cls(Image.fromarray(fn))
        if isinstance(fn,bytes): fn = io.BytesIO(fn)
        return cls(load_image(fn, **merge(cls._open_args, kwargs)))
    ### more irrelevant stuff

Usually you would pass the path of the mask file (as you did in the code you providet at src_get_label) to MaskBlock but as you can see you can also pass it a np.array.
So if you can use get_y to derive the combined mask as a np.array you should be golden.

Note: if you want to use PIL to load the .png-masks use mode='L' and the final np.array should have a dtype of np.uint8 (so return final_mask.astype('np.uint8') or something similar).
Hope this helps :wink: