DataBlock summary varies on how it takes images for final sample

RMNT · February 5, 2021, 9:02am

It’s an image segmentation problem. The summary sometimes prints a tuple with the correct input and its mask as a final sample. However, other times it takes the same mask both times forming a tuple containing two identical masks. The first one as a PILImage, the second as a PILMask.

I am guessing that there is a problem with how DataBlock reaches input and targets.

My directory structure is:

The main directory ‘photos’ holding all folders with pictures and masks
directories v1 to v8 with an image and its mask
directories ‘image’ and ‘mask’ in every vX holding corresponding data.
My DAtaBlock code:

field = DataBlock(blocks=(ImageBlock, MaskBlock(codes)),
                get_items=get_image_files,
                splitter=RandomSplitter(),
                get_y=get_msk,
                batch_tfms=[*aug_transforms(size=quarter)])

where codes is np.loadtxt(str(train_path)+'codes.txt', dtype='str') None or Field, train_path is ‘photos/’
get_msk is lambda o: train_path+'{}/mask/mask.tif'.format(o.parts[1])

Correct formation of final sample:

Building one sample
  Pipeline: PILBase.create
    starting from
      photos/v4/image/S1_VV_D10_AD_median_2019_08_11_visas_v4.tif
    applying PILBase.create gives
      PILImage mode=RGB size=4000x4000
  Pipeline: <lambda> -> PILBase.create
    starting from
      photos/v4/image/S1_VV_D10_AD_median_2019_08_11_visas_v4.tif
    applying <lambda> gives
      photos/v4/masks/mask.tif
    applying PILBase.create gives
      PILMask mode=L size=4000x4000

Final sample: (PILImage mode=RGB size=4000x4000, PILMask mode=L size=4000x4000)

Incorrect formation of the final sample:

Building one sample
  Pipeline: PILBase.create
    starting from
      photos/v4/masks/mask.tif
    applying PILBase.create gives
      PILImage mode=RGB size=4000x4000
  Pipeline: <lambda> -> PILBase.create
    starting from
      photos/v4/masks/mask.tif
    applying <lambda> gives
      photos/v4/masks/mask.tif
    applying PILBase.create gives
      PILMask mode=L size=4000x4000

Final sample: (PILImage mode=RGB size=4000x4000, PILMask mode=L size=4000x4000)

When creating dataloaders I put batch size as the number of all training samples for I want to see all of them when calling show_batch() method. The method shows at most one correct input and mask pair. Other masks are blank or it doesn’t show correct pairs at all.

I created PILMask from Path() to see if there was a problem with data. There was none, all maps were shown correctly.

I would like to get some help in dealing with this issue. Please, feel free to ask for information if necessary.