Efficient use of label_from_func for image segmentation

Using the data block API, is there a way to create labels (segmented image), using label_from_func as and when required and not for all training images at once?

I have a DataFrame of image paths and their run length encoded segmentation labels. There are about 5 rle segmentations strings that segment different objects in one image.

So, I subclassed the SegmentationItemList and modified the label_from_func attribute such that it accepts a data frame input also. The function combines all the 5 rle segmentations strings into one segmented image for 1 input image. However, the process of creating the itemlist takes too long (2.5 hours) because it is creating these labels for ALL the training images once. Is there a way that it only runs label_from_func and creates the labels when it is asked to do so for the required training examples?

Class SegMasksItemList(SegmentationItemList): 
    def label_from_func(self, func:Callable, label_cls:Callable=None,df: Callable=None, **kwargs)->'LabelList':
        return self._label_from_list([func(o, df) for o in self.items], label_cls=label_cls, **kwargs)

src=SegMasksItemList.from_folder('../input/train/').split_by_rand_pct(valid_pct=0.10).label_from_func(func=func_label, df=df)

My defined function ‘func’ essentially takes in item, which is the image_path, loads segmentations of its 5 rle encoded strings using vision.image.open_mask_rle, and then it combines all these 5 segmentations into 1 and returns that array.

This might be a dumb solution :slight_smile: but in some cases it might be best (if the trade off between speed and disk space usage is right) to actually create the segmentation mask images on the drive and then pass the path to that folder. I realize this is not what you were asking though, so someone else might chime in to help with that.