I am creating an ImageDataBunch with 2 Segmentation datasets using fastai v.1.0.11, and I have 13 classes.
But when I look at
y (some ground truth labels of pixels) in the following, I get a tensor with integers 0, 29, …, 76, 78, …, 178, 225 .
bunch = ImageDataBunch.create(dataset_train, dataset_valid, ds_tfms = tfms, bs=1, num_workers=0, tfm_y = True, mode = 'bilinear',padding_mode = 'zeros')
x, y = next(bunch.train_dl.__iter__())
Hence in order to fit the model, I have to put that I want the number of classes to be 226 …
When I was using DataBunch.create with the same segmentation datasets with fastai v1.0.5,
y would have integers between 0 and 12…
Do you have any idea why my labels would be transformed like this ?
Thank you very much !
I actually understood why I get such label values. These are the values you get when converting the mask images to monochrome scale (‘L’ mode with PIL for example) .
But how can I prevent my masks to be read as such and just get values ranging 0-12 ?
I understand some preprocessing could be done in the very first place because when creating the SegmentationDatasets I only pass a list of paths to the masks files, and have found no way to change the way the masks are read afterwards. But I’m not sure that’s a very sustainable approach. Now if I just change the greyscale values to 0…12, I won’t be able to distinguish the objects when showing the images and have no good visualization…
Can you share your code for creating the datasets?
X_train and X_valid are numpy arrays of paths to jpg files ( images)
y_train and y_valid are numpy arrays of paths to png files (masks)
dataset_train = SegmentationDataset(X_train, y_train)
dataset_valid = SegmentationDataset(X_valid, y_valid )
tfms = [rand_pad(padding = 256, size = 512, mode= 'zeros'), rand_pad(padding = 256, size = 512, mode= 'zeros')]
bunch = ImageDataBunch.create(dataset_train, dataset_valid, ds_tfms = tfms, bs=1, num_workers=0, tfm_y = True, mode = 'bilinear',padding_mode = 'zeros', size = 512)
I actually think the issue comes from the
open_mask code in
Indeed, the mask images are read with PIL and converted to greyscale. pil2tensor on rgb image will return tensor with values ranging 0…12.
BUT when the image is greyscale, it returns the values of the pixels (0 to 255)
def open_mask(fn:PathOrStr, div=False)->ImageSegment:
object create from mask in filefn
, divides pixel values by 255."
x = PIL.Image.open(fn).convert('L')
mask = pil2tensor(x).float()
if div: mask.div_(255)
I should also specify that probably the right labels are returned when using pil2tensor on a rgb mask image because there is some palette color encoding. When using
get_colors on one of my mask png images opened with PIL, I get:
[(152769, 1), (244169, 2), (71235, 3), (50048, 5), (20520, 7), (103130, 8), (119375, 10), (83554, 12)]
Yeah, maybe we shouldn’t add this
.convert('L') though some people where having problems loading masks without it. I think I’ll add an argument to pass this conversion mode.