Model that accepts an image and a segmentation-map as inputs

I would like to build a model that uses the images of the prostate-cancer competition on kaggle:

Here we get a tiff with the biopsy sample and an image-mask. The model should infere the cancer class on the isup_scale.

I have build a custom datablock (see this blog-post:

def custom_img(fn):
    fn = f'{train}/{fn.image_id}.tiff'
      file = openslide.OpenSlide(str(fn))
    except Exception as e:
    t = tensor(file.get_thumbnail(size=(255, 255)))
    img_pil = PILImage.create(t)
    return img_pil

def custom_selective_mask(fn):
    fn = f'{mask}/{fn.image_id}_mask.tiff'
    file = openslide.OpenSlide(str(fn))
    t = tensor(file.get_thumbnail(size=(255, 255)))[:,:,0]
    ts = show_selective(t, min_px=None, max_px=None)
    return ts

blocks = (ImageBlock,

getters = [

dblock = DataBlock(blocks=blocks,
                   item_tfms=[Resize(224), ToTensor],
                   batch_tfms=[IntToFloatTensor, Normalize.from_stats(*imagenet_stats)])

dl = dblock.dataloaders(train_labels, bs=4)
dl.n_inp = 2

Now I think I have to create a custom model with a forward function that accepts two items and stacks them into one tensor so that the model can use the image and the mask for learning.

class ProstateCancerModel(Module):
  def __init__(self, encoder, head):
    self.encoder, self.head = encoder, head

  def forward(self, x1, x2):
    ftrs =[self.encoder(x1), self.encoder(x2)], dim=1)
    return self.head(ftrs)

def loss_func(out, targ):
    return CrossEntropyLossFlat()(out, targ.long())

def siamese_splitter(model):
    return [params(model.encoder), params(model.head)]

encoder = create_body(resnet34, cut=-2)
head = create_head(512*2, 2, ps=0.5)
model = ProstateCancerModel(encoder, head)

Each item in the dataloader is now a tuple with the image, the image mask and the label:


(PILImage mode=RGB size=186x255,
 PILMask mode=L size=186x255,

When I try to build the learner and fit for some epoches I get an error according a mismatch on dimensions…

learner = Learner(dl,

RuntimeError                              Traceback (most recent call last)
<ipython-input-233-c0f1748717fa> in <module>()
      6                   )
      7 learner.freeze()
----> 8 learner.fit_one_cycle(5)

20 frames
/usr/local/lib/python3.7/dist-packages/torch/ in __torch_function__(cls, func, types, args, kwargs)
    994         with _C.DisableTorchFunction():
--> 995             ret = func(*args, **kwargs)
    996             return _convert(ret, cls)

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [4, 224, 224] instead

So the input provided is a batch of size 4 with an image of 224x224 (imagenet resolution). I think this is correct. But I cannot figure out, where this expected input dimension comes from?

A tensor for a 2D color image should have four dimensions when passed to the model.
B x C x W x H where
B = batch size
C = color channels
H = height
W = width

This is why the model expected 4D input.
The problem is probably the mask. I assume it is a binary mask and has dimensions B x W x H. This makes sense when passing to the loss function, which expects the target to be one dimension smaller than the predictions. But when passing to a model it leads to problems.

A possible solution could be to add a specific mask transform:

def extend_mask(x:TensorMask):
    return x.unsqueeze(1)

This will give you another error about size mismatch, as you now pass a tensor of size B x 1 x W x H, but the model expects B x 3 x W x H. Try adapting the first model layer, or stack the mask three times in the above given transformation.

Thanks! I have fixed the problem with the dimensions but I now I ran into another problem.

Fixed model class:

class ProstateCancerModel(Module):
  def __init__(self, encoder, head):
    self.encoder, self.head = encoder, head

  def forward(self, x1, x2):
    enc1 = self.encoder(x1)
    enc2 =  self.encoder(
                          torch.stack((x2.unsqueeze(1), x2.unsqueeze(1), x2.unsqueeze(1)), dim=1).squeeze().float()
    #enc2 = TensorImage(enc2)
    #enc2 = self.encoder(x2)
    print("finished enc1, enc2")
    print(enc1.shape, enc2.shape)
    ftrs =[enc1, enc2], dim=1)
    return self.head(ftrs)

def loss_func(out, targ):
    #return CrossEntropyLossFlat()(out, targ.long())
    print("loss function")
    return CrossEntropyLossFlat()(out, targ.long())

The error occures on the concatenation of the two inputs:

no implementation found for '' on types that implement __torch_function__: [TensorImage, TensorMask]

Both fastai classes (TensorImage and TensorMask) inherit from Tensor class. So I do not really understand why I cannot invoke ?

I kind of solved this by casting both inputs into TensorBase object.

class ProstateCancerModel(Module):
  def __init__(self, encoder, head):
    self.encoder, self.head = encoder, head

  def forward(self, x1, x2):
    enc1 = self.encoder(x1)
    enc2 =  self.encoder(
                          torch.stack((x2.unsqueeze(1), x2.unsqueeze(1), x2.unsqueeze(1)), dim=1).squeeze().float()
    #enc2 = TensorImage(enc2)
    #enc2 = self.encoder(x2)
    print("finished enc1, enc2")
    print(enc1.shape, enc2.shape)
    enc1 = TensorBase(enc1)
    enc2 = TensorBase(enc2)
    ftrs =[enc1, enc2], dim=1)
    return self.head(ftrs)

But then I ran into a different error:

RuntimeError: CUDA error: device-side assert triggered

I suspect the loss-function is wrong?!

The CUDA error: device-side assert triggered is cryptic. In my experience, it is often caused somewhere in the loss function.
Try running the model on CPU only. This will give you a more meaningful error message.

You could also try to create two ImageBlocks instead of a MaskBlock and an ImageBlock. The model won’t care what Tensor subclass you give it, but handling the DataLoader and loss functions might get easier.

I managed to get the model working by changing the input blocks into two imageblocks and passing the image-mask as image into the model.

I have then run the nb on kaggle on the full data set but it did more than terrible…

The CohenKappa Score stays at zero and validationa accuracy is getting worse and worse… any ideas on how to improve?

