Image Segmentation Resize method for Pillow Image

Hello everybody,

I have a question concerning Image Segmentation task. I am solving a segmentation task and I am using the Resize method in DataBlock where the blocks are (ImageBlock(PILImage), MaskBlock(codes)) and I use in item_tfms=[Resize(size)], this transformation resizes both the input image and the corresponding segmentation ground truth mask.

As far as I understand checking this file ( and in line 242 - class Resize(RandTransform) it seems that the crop and padding method are used for image with the bilinear transformation, while the nearest neighbor is used for the segmentation ground truth mask.

Is there a specific patter that is followed and these are the default options? Because we have information loss as we scale-down the images. Are the ground truth mask pixels have valid correspondence to the input image after the resizing? If I want to scale-up the images to their original size before the resizing should I use the same methods that used for scaling-down the images?

Will be a valid training schema to start training with very small image resolution and as the training proceeds to scale-up the images and finish training as soon as they reach their initial size? This will save a lot of points where the information can be lost, such as resizing.

Thank you in advance.

Bilinear interpolation yields better visual results which is what you want for the images, but for the segmentation masks you need to to pick one of the classes.

For example if you have 2 pixels next to each other with values [100, 200] and you need to upsample them to be 3 pixels wide it would make sense that it would be the average of the 2 neighboring pixels [100, 150, 200]. For the segmentation mask each pixel value represents the index for a class. Let’s say for example our classes are [background, sky, tree, car, person] so the pixel values that represent these different classes would be [0, 1, 2, 3, 4]. Now let’s say for the original image pixels of [100, 200] the corresponding segmentation class pixel values were [1, 3] i.e. [sky, car]. If you used the bilinear interpolation the new segmentation values would be [1, 2, 3] or [sky, tree, car] which is definitely not correct. Instead you want the middle pixel to be either 1 or 3 - sky or car, not 2 tree which is why nearest neighbors are used instead for the segmentation masks.

Yes, this is called progressive resizing which is a method that is covered quite a bit in the course and book. Jeremy used this technique to win the dawn bench competition several years ago beating out Google, Intel and other major players. You can read more about it here:

1 Like

The default resizing methods differ for images and masks in Fastai: bilinear for images, nearest neighbor for masks, aiming to preserve mask details. When scaling up, use consistent methods. Training with gradual scaling can mitigate information loss, aiding in convergence.

Thanks for sharing this.
}–>Cuyahoga County Auditor