# Unet outputting (batch, 2, xsize, ysize) as predictions

I am using a unet for binary segmentation and it is outputting predictions with depth 2, breaking the accuracy metric and resulting in bad output.

When I use regular accuracy as a metric, I get this error:

``````/opt/conda/lib/python3.6/site-packages/fastai/metrics.py in accuracy(input, targs)
28     input = input.argmax(dim=-1).view(n,-1)
29     targs = targs.view(n,-1)
---> 30     return (input==targs).float().mean()
31
32 def accuracy_thresh(y_pred:Tensor, y_true:Tensor, thresh:float=0.5, sigmoid:bool=True)->Rank0Tensor:

RuntimeError: The size of tensor a (448) must match the size of tensor b (50176) at non-singleton dimension 1
``````

If I use accuracy_thresh, the model runs and trains but still returns bad results. It converges on an accuracy_thresh of 50%.

The 2 layers of my output add up to 1. If preds[0][0] is 0.93, preds[0][1] is 0.07. I assume this is my prediction for each class? Do I need to rewrite my loss and accuracy function?

Edit: I believe I have figured out my accuracy problem using the function in the lesson 2 camvid notebook, but it seems that my loss is not working very well as it converges on predicting 0 everywhere and achieving a high accuracy (the labels are mostly 0).

So I need to determine a better loss function.

Well if you use a softmax activation youâ€™ll get something like this. You then need to keep the highest score or if you want to keep `preds[0][1]` (which is the probability that the pixel is in the mask). Your main problem is actually that your ground truth are inconsistent with your predictions. If you want to use softmax, you need to convert your ground truth so that each it has shape (2, H, W), where each pixel contains [0, 1] (which means it is a 1) or [1, 0] (which means it is a 0). If you use sigmoid and a single class, youâ€™ll have inputs of shape (1, H, W), where each pixel contains either 0 or 1 (which is what you probably have). The output will then contain for each pixel the probability that it is a 1.
Hope I am clear!
If you converge fast to 0, you can consider some options:

• Lower learning rate
• Find a loss that penalizes false negative more (weighted cross entropy or dice for instance)
• Similarly, donâ€™t use accuracy for binary segmentation, dice or IoU are better indicators.
1 Like

Youâ€™re absolutely right, that is basically the stage I have gotten to. I am currently digging around trying to discover how to do either of those solutions, so I will ask here.

1. fastaiâ€™s unet_learner is giving me a softmax activation. How can I change this to be a sigmoid? It looks to me in the code that DynamicUnet has sigmoids, does the learner slap softmax on the end? How would I change this?

class SegLabelListCustom(SegmentationLabelList):
def open(self, fn): return open_mask(fn, div=True, convert_mode=â€śLâ€ť)

class SegItemListCustom(ImageList):
_label_cls, _square_show_res = SegLabelListCustom, False

How/when do I process them into separate channels? Maybe Iâ€™ll try a custom function for open_mask?

Fastai model doesnâ€™t include an activation, but it computes the right number of outputs depending on your number of classes. However, it uses one when calculating metrics, which is dependent on your loss function mainly.
If you really want to process them in 2 channels:

``````def open(self, fn):
new_px = torch.zeros((2, *px.shape[-2:])).int()
new_px[0][px==0] = 1
new_px[1][px==1] = 1
``````

That should work.
What does `data.train_ds.classes` yield (where `data` is your databunch)? And what is your loss function ?

I am currently using the default loss function, which is:
`FlattenedLoss of CrossEntropyLoss()`

I would prefer to use a different loss such as NLLLoss or BCELoss with weights, since my classes are highly imbalanced and cause my model to predict mostly 0s. However I have not been able to get them working because of my truth being a different shape.

Ideally I would like to calculate the weights based on the probability in each batch.

`data.train_ds.classes` yields `['clean', 'HE']`, which I set myself.

Ok, so you have 2 options:

• keep everything as it is except you change the `open` function to make masks have 2 channels
• use something like BCE (i recommend using `BCEWithLogitsLoss`, else nothing will ever use an activation) that expects 1-channel input, but change `classes` to something like `HE` (with BCE you donâ€™t need a class for background, it expects one channels with values between 0 and 1).
Iâ€™d tend towards the second solution as doing multiclass just to compute background is a bit useless.
1 Like

Thanks to you I got it to work.

Switching to `BCEWithLogitsLoss` and going down to one class was the ticket.

I also had to convert my targets to float tensors, so my loss function looked like this:

``````def BCELogitsLoss(input, target, weight=None, size_average=None, reduce=None, reduction='mean', pos_weight=weights):
target = target.float()
return F.binary_cross_entropy_with_logits(input, target,weight, pos_weight=pos_weight, reduction=reduction)``````

Great! You can also use pytorchâ€™s BCE directly by using a custom `ItemBase`:

``````class ImageSegmentFloat(ImageSegment):
@property
def data(self):
return self.px.float()

def open(self, fn):

def analyze_pred(self, pred, thresh: float = 0.5):
return (pred > thresh).float()

def reconstruct(self, t):
return ImageSegmentFloat(t)
``````

Both work, depends on your preference.

1 Like

This second option gives me the following error:

``````Exception: It's not possible to apply those transforms to your dataset:
grid_sampler(): expected input and grid to have same dtype, but input has long and grid has float``````

Yes indeed, I had the same problem and found the solution, take the edited code instead.

Hi there again,

Thank you for your help so far.

I believe I am getting a similar problem when changing from default loss to BCEWithLogitsLoss using Unet-ResNet for Segmentation:

I have masks with classes [0,1,2,3,4,5] and am using dice coefficient as my metric.

``````learn = unet_learner(data, arch, metrics=[dice])
learn.loss_func = nn.BCEWithLogitsLoss()
lr_find(learn)
learn.recorder.plot()
``````

The error I get:

`ValueError: Target size (torch.Size([8, 1, 64, 400])) must be the same as input size (torch.Size([8, 5, 64, 400]))`

Did you come across this at all?

`BCEWithLogitsLoss` is used for binary masks, not for 5 masks. You should use `CrossEntropyLoss` instead.

I must have misinterpreted it then. Just to clarify, I am doing multi-label segmentation with just the one mask. The mask will have any of the values [0, 1, 2, 3, 4].

Would BCE instead be used for multi-channel where each mask is one hot encoded?

This is for the Severstal comp.

BCE expects input and target masks that have one channel with only 0 and ones. Cross-entropy expects target mask to have one channel with values between 0 and 4 (which is what you have) and input mask (=output of the network) to be one-hot encoded (so 5 channels with values 0 or 1). It seems to me you are exactly in the second case.

1 Like

how did you figure out what to set for â€śweightsâ€ť re: pos_weight for your custom loss?