Hi there,
I have a question about segmentation models in fastai
. Is it possible to provide data in multi-class multi-label format? Currently, I use MaskBlock(codes=[0, 1, 2, 3])
to read masks. They look like:
0 0 1 2 1
0 1 1 0 2
0 0 3 0 1
So one can use out-of-the-box fastai classes without any issues. However, what if I have overlapping masks? Let’s say, building from the toy example above, we have three overlapping classes:
0 0 1 1 1
0 0 1 1 0
0 0 1 0 0
0 2 2 0 0
0 2 2 0 0
0 0 0 2 0
0 0 0 0 0
0 0 3 3 0
0 3 0 3 0
One possible solution is to combine codes using bit masks and represent overlapping classes as a new class. For example:
c1 = 1 -> replace -> 0b001
c2 = 2 -> replace -> 0b010
c3 = 3 -> replace -> 0b100
c1 = 1
c2 = 2
c3 = 4
c1 and c2 = (0b001 | 0b010) = 0b011 = 3
c1 and c3 = (0b001 | 0b100) = 0b101 = 5
...
In this case, we can encode all these combinations as a new mask with values from 0
to 7
. For example, if we have three overlapping masks, then we can represent them as a single array as follows:
0 0 1 2 0 0 4 0 0 6 0 1
0 1 1 2 2 0 0 4 4 = 2 7 5
1 1 0 2 2 0 0 4 4 3 7 4
But in this way, we increase the number of forecasted classes, which may (or may not?) lead to decreased performance, depending on how many overlaps are in the data.
So I wonder if there is some (simple) way to provide masks as N-channel images instead of just 2D mask tensors? For now, I am going to replace MaskBlock
with some custom transformation that returns something different from PILMask
. But maybe there is an easier way to achieve the same result.
Update
At the moment, I plan to use a second ImageBlock
:
db = DataBlock(blocks=(ImageBlock(...), ImageBlock(...)), ...)
I haven’t tried it yet, but at the first glance, it looks like a working approach.