Generate segmentation mask by function

Hello developers.

Using fastai v1, I would like to construct a DataBunch for image segmentation training. However the masks will not be taken from image files. Rather, the y-mask will be constructed by my own function given the x filename as a parameter.

Please show me the right way to do this using the data block API. I understand how to return a mask filename label given the image filename, but not how to hook into the datablock API to return the mask image itself.

Here’s what I have so far:

src = SegmentationItemList.from_csv(num_workers=0, seed=rseed, csv_name='train_labels.csv', suffix='.tif', path=DATA, folder='train', test='test', ds_tfms=tfms, bs=BATCH_SIZE, size=ORIGIMSIZE)

data = (src
        .random_split_by_pct()
 #       .label_from_func(get_y_fn, classes=codes)
        .databunch(bs=BATCH_SIZE, path=DATA, num_workers=0)
        .normalize(imagenet_stats))

Also, it would help to know where to specify num_workers=0. data.show_batch() seems to use 4 workers regardless of num_workers=0.

Thanks for your help.

You should create your own ItemList for the labels where you will define the get method that returns your mask (instead of trying to use open_mask). Then in your label call, pass label_cls={my_custom_ItemList}.

Can you give some more direction with this issue? I have put in time tracing with a debugger, but still can’t figure out the correct calls.

Here’s my code and the error:

class MySegItemList(SegmentationItemList):
    def open(self, fn): 
        return super().open(fn) #test for now
    
src = MySegItemList.from_csv(num_workers=0, seed=rseed, csv_name='train_labels.csv', suffix='.tif', path=DATA, folder='train', test='test', ds_tfms=tfms, bs=BATCH_SIZE, size=96)

src
Out:
MySegItemList (220025 items)
[Image (3, 96, 96), Image (3, 96, 96), Image (3, 96, 96), Image (3, 96, 96), Image (3, 96, 96)]...
Path: /home/malcolm/kaggle/HPC/Data

data = (src
        .random_split_by_pct()
        .get_label_cls(None,label_cls=src)
        .databunch(bs=BATCH_SIZE, path=DATA, num_workers=0)
        .normalize(imagenet_stats))

Out:
AssertionError                            Traceback (most recent call last)
<ipython-input-59-05a632162ce9> in <module>
      1 data = (src
      2         .random_split_by_pct()
----> 3         .get_label_cls(None,label_cls=src)
      4         .databunch(bs=BATCH_SIZE, path=DATA, num_workers=0)
      5         .normalize(imagenet_stats))

~/anaconda3/envs/fastaiv3/lib/python3.6/site-packages/fastai/data_block.py in _inner(*args, **kwargs)
    370         def _inner(*args, **kwargs):
    371             self.train = ft(*args, **kwargs)
--> 372             assert isinstance(self.train, LabelList)
    373             self.valid = fv(*args, **kwargs)
    374             self.__class__ = LabelLists

AssertionError: 

Thanks again!

data = (src
        .random_split_by_pct()
        .get_label_cls(None,label_cls=src)
        .databunch(bs=BATCH_SIZE, path=DATA, num_workers=0)
        .normalize(imagenet_stats))

I don’t think get_label_cls is what you want in this block. Each ItemList class has an attribute that’s supposed to contain a suggested ItemList subclass to cast its labels to when using factory functions, and get_label_cls just returns that class (not an instance of that class). Moreover, it’s overridden by the label_cls argument, so if you call it with the argument label_cls=src, it’s going to return src which is an instance, not a class.

On that line I think you are wanting to attach your labels, so something like…

ils = src.random_split_by_pct()
# Label with the input data
ils = ils.label_from_lists(ils.train.items,ils.valid.items,label_cls = MySegItemList)
data = ils.databunch(bs=BATCH_SIZE, path=DATA, num_workers=0).normalize(imagenet_stats)

… is probably more like what you want. (I haven’t tested this with your code, but it’s how I’ve been doing something vaguely similar today.)

Also, my understanding is that you want to generate a mask from a function, and that’s the purpose of creating MySegItemList. If so, you probably want to treat the input data as normal images, and in that case you’d want to make src an instance of ImageItemList and only have the labels be MySegItemLists.

Thanks for your response. I have been offline while traveling and just got around to trying it.

So I got much further using your suggestions, but am still stuck getting the whole system to work. The intent is to look up the input image filename in a data frame, and choose one of two masks to use as its y segmentation mask. Here is my code.

def get_y(x): 
    lb = yDF.loc[Path(x).stem,'label']
    if lb==0:
        return ncMask
    elif lb==1:
        return cMask
    else:
        8/0
    
class MySegItemList(SegmentationItemList):
    def open(self, fn): 
        return get_y(fn)
    
src = ImageItemList.from_csv(num_workers=0, seed=rseed, csv_name='train_labels.csv', suffix='.tif', path=DATA, folder='train', test='test', ds_tfms=tfms, bs=BATCH_SIZE, size=96)

codes = array(['UKN','NC','C'])

ils = src.random_split_by_pct()
ils = ils.label_from_lists(ils.train.items,ils.valid.items,label_cls = MySegItemList, classes=codes)
data = ils.databunch(bs=BATCH_SIZE, path=DATA, num_workers=0).normalize(imagenet_stats)

foo = data.one_batch()

At this point I can manually inspect the batch foo and see the pairs of image and segmentation mask. It all looks correct. However, when I try to display some pairs with
data.show_batch(rows=2, figsize=(96,96))

I see just the masks, not the masks overlaid on images with some transparency.

Further, creating the learner fails in to_device(…) because data.c (number of classes) is not set.

I’m sure there’s a simple, correct way to create the databunch, but I am just not quite getting it. Thanks for any help.