DataBlock: can a getter tell whether the image is for training vs validation?

FlorinAndrei · September 17, 2022, 7:10pm

Image segmentation with Unet. I put all image and mask attributes in a Pandas dataframe and process them, depending on the attributes. Code sample:

def make_mask(row):
    """
    Is called by DataBlock(getters=).
    Takes a list of paths to mask files from a Pandas column.
    Makes sure all masks are 8 bits per pixel.
    If there are multiple masks, merges them.
    Returns a PILMask.create() mask image.
    """
    f = ColReader("mask")
    # PILMask.create() probably forces 8 bits per pixel.
    all_images = [np.asarray(PILMask.create(x)) for x in f(row)]
    image_stack = np.stack(all_images)
    image_union = np.amax(image_stack, axis=0)
    return PILMask.create(image_union)


def make_image(row):
    """
    Receives a Pandas row. Gets an image path from the "image" column.
    Makes sure all images are 8 bits per color channel.
    (There may be multiple color channels.)
    Returns a PILImage.create() image.
    """
    f = ColReader("image")
    # PILImage.create() probably forces 8 bits per color channel.
    image_array = np.asarray(PILImage.create(f(row)))
    return PILImage.create(image_array)


# Most images are 960 x 720. A few images are much larger. So we resize to 960 x 720 first.
# The final resize is to the desired image size for the model.
crop_datablock = DataBlock(
    blocks=(ImageBlock, MaskBlock),
    getters=[make_image, make_mask],
    splitter=TrainTestSplitter(stratify=crop_df["dataset"].to_list()),
    item_tfms=item_tfms,
)

Image augmentation is done with Albumentations in the item_tfms which is not shown here. I have a separate question about that here:

I need to differentiate between training vs validation in terms of the processing I apply to images and masks. I have a hard time doing that with item_tfms, as you can see in the previous thread. I think I could shift all that processing to the getter functions, since they already do some low level processing (changing the bits per pixel, merging masks, etc).

There is one issue: how do I tell, from within a getter function, whether the image or the mask are for training or whether they are for validation?