Unet outputting (batch, 2, xsize, ysize) as predictions

I am using a unet for binary segmentation and it is outputting predictions with depth 2, breaking the accuracy metric and resulting in bad output.

When I use regular accuracy as a metric, I get this error:

/opt/conda/lib/python3.6/site-packages/fastai/metrics.py in accuracy(input, targs)
     28     input = input.argmax(dim=-1).view(n,-1)
     29     targs = targs.view(n,-1)
---> 30     return (input==targs).float().mean()
     32 def accuracy_thresh(y_pred:Tensor, y_true:Tensor, thresh:float=0.5, sigmoid:bool=True)->Rank0Tensor:

RuntimeError: The size of tensor a (448) must match the size of tensor b (50176) at non-singleton dimension 1

If I use accuracy_thresh, the model runs and trains but still returns bad results. It converges on an accuracy_thresh of 50%.

The 2 layers of my output add up to 1. If preds[0][0] is 0.93, preds[0][1] is 0.07. I assume this is my prediction for each class? Do I need to rewrite my loss and accuracy function?

Edit: I believe I have figured out my accuracy problem using the function in the lesson 2 camvid notebook, but it seems that my loss is not working very well as it converges on predicting 0 everywhere and achieving a high accuracy (the labels are mostly 0).

So I need to determine a better loss function.

Well if you use a softmax activation you’ll get something like this. You then need to keep the highest score or if you want to keep preds[0][1] (which is the probability that the pixel is in the mask). Your main problem is actually that your ground truth are inconsistent with your predictions. If you want to use softmax, you need to convert your ground truth so that each it has shape (2, H, W), where each pixel contains [0, 1] (which means it is a 1) or [1, 0] (which means it is a 0). If you use sigmoid and a single class, you’ll have inputs of shape (1, H, W), where each pixel contains either 0 or 1 (which is what you probably have). The output will then contain for each pixel the probability that it is a 1.
Hope I am clear!
If you converge fast to 0, you can consider some options:

  • Lower learning rate
  • Clip gradients
  • Find a loss that penalizes false negative more (weighted cross entropy or dice for instance)
  • Similarly, don’t use accuracy for binary segmentation, dice or IoU are better indicators.
1 Like

You’re absolutely right, that is basically the stage I have gotten to. I am currently digging around trying to discover how to do either of those solutions, so I will ask here.

  1. fastai’s unet_learner is giving me a softmax activation. How can I change this to be a sigmoid? It looks to me in the code that DynamicUnet has sigmoids, does the learner slap softmax on the end? How would I change this?

  2. I’m loading my masks like:

    class SegLabelListCustom(SegmentationLabelList):
    def open(self, fn): return open_mask(fn, div=True, convert_mode=“L”)

    class SegItemListCustom(ImageList):
    _label_cls, _square_show_res = SegLabelListCustom, False

How/when do I process them into separate channels? Maybe I’ll try a custom function for open_mask?

Fastai model doesn’t include an activation, but it computes the right number of outputs depending on your number of classes. However, it uses one when calculating metrics, which is dependent on your loss function mainly.
If you really want to process them in 2 channels:

def open(self, fn):
    mask = open_mask(fn, div=True, convert_mode='L')
    px = mask.px
    new_px = torch.zeros((2, *px.shape[-2:])).int()
    new_px[0][px==0] = 1
    new_px[1][px==1] = 1
    mask.px = new_px
    return mask

That should work.
What does data.train_ds.classes yield (where data is your databunch)? And what is your loss function ?

I am currently using the default loss function, which is:
FlattenedLoss of CrossEntropyLoss()

I would prefer to use a different loss such as NLLLoss or BCELoss with weights, since my classes are highly imbalanced and cause my model to predict mostly 0s. However I have not been able to get them working because of my truth being a different shape.

Ideally I would like to calculate the weights based on the probability in each batch.

data.train_ds.classes yields ['clean', 'HE'], which I set myself.

Ok, so you have 2 options:

  • keep everything as it is except you change the open function to make masks have 2 channels
  • use something like BCE (i recommend using BCEWithLogitsLoss, else nothing will ever use an activation) that expects 1-channel input, but change classes to something like HE (with BCE you don’t need a class for background, it expects one channels with values between 0 and 1).
    I’d tend towards the second solution as doing multiclass just to compute background is a bit useless.
1 Like

Thanks to you I got it to work.

Switching to BCEWithLogitsLoss and going down to one class was the ticket.

I also had to convert my targets to float tensors, so my loss function looked like this:

def BCELogitsLoss(input, target, weight=None, size_average=None, reduce=None, reduction='mean', pos_weight=weights):
    target = target.float()
    return F.binary_cross_entropy_with_logits(input, target,weight, pos_weight=pos_weight, reduction=reduction)

Great! You can also use pytorch’s BCE directly by using a custom ItemBase:

class ImageSegmentFloat(ImageSegment):
    def data(self):
        return self.px.float()

class MaskList(SegmentationLabelList):
    def open(self, fn):
        mask = open_mask(fn)
        return ImageSegmentFloat(mask.px)
    def analyze_pred(self, pred, thresh: float = 0.5):
        return (pred > thresh).float()
    def reconstruct(self, t):
        return ImageSegmentFloat(t)

Both work, depends on your preference.

1 Like

This second option gives me the following error:

Exception: It's not possible to apply those transforms to your dataset:
 grid_sampler(): expected input and grid to have same dtype, but input has long and grid has float

Yes indeed, I had the same problem and found the solution, take the edited code instead.

Hi there again,

Thank you for your help so far.

I believe I am getting a similar problem when changing from default loss to BCEWithLogitsLoss using Unet-ResNet for Segmentation:

I have masks with classes [0,1,2,3,4,5] and am using dice coefficient as my metric.

learn = unet_learner(data, arch, metrics=[dice])
learn.loss_func = nn.BCEWithLogitsLoss()

The error I get:

ValueError: Target size (torch.Size([8, 1, 64, 400])) must be the same as input size (torch.Size([8, 5, 64, 400]))

Did you come across this at all?

BCEWithLogitsLoss is used for binary masks, not for 5 masks. You should use CrossEntropyLoss instead.

I must have misinterpreted it then. Just to clarify, I am doing multi-label segmentation with just the one mask. The mask will have any of the values [0, 1, 2, 3, 4].

Would BCE instead be used for multi-channel where each mask is one hot encoded?

This is for the Severstal comp.

BCE expects input and target masks that have one channel with only 0 and ones. Cross-entropy expects target mask to have one channel with values between 0 and 4 (which is what you have) and input mask (=output of the network) to be one-hot encoded (so 5 channels with values 0 or 1). It seems to me you are exactly in the second case.

1 Like

how did you figure out what to set for “weights” re: pos_weight for your custom loss?