I am using a unet for binary segmentation and it is outputting predictions with depth 2, breaking the accuracy metric and resulting in bad output.

When I use regular accuracy as a metric, I get this error:

/opt/conda/lib/python3.6/site-packages/fastai/metrics.py in accuracy(input, targs)
28 input = input.argmax(dim=-1).view(n,-1)
29 targs = targs.view(n,-1)
---> 30 return (input==targs).float().mean()
31
32 def accuracy_thresh(y_pred:Tensor, y_true:Tensor, thresh:float=0.5, sigmoid:bool=True)->Rank0Tensor:
RuntimeError: The size of tensor a (448) must match the size of tensor b (50176) at non-singleton dimension 1

If I use accuracy_thresh, the model runs and trains but still returns bad results. It converges on an accuracy_thresh of 50%.

The 2 layers of my output add up to 1. If preds[0][0] is 0.93, preds[0][1] is 0.07. I assume this is my prediction for each class? Do I need to rewrite my loss and accuracy function?

Edit: I believe I have figured out my accuracy problem using the function in the lesson 2 camvid notebook, but it seems that my loss is not working very well as it converges on predicting 0 everywhere and achieving a high accuracy (the labels are mostly 0).

Well if you use a softmax activation youâ€™ll get something like this. You then need to keep the highest score or if you want to keep preds[0][1] (which is the probability that the pixel is in the mask). Your main problem is actually that your ground truth are inconsistent with your predictions. If you want to use softmax, you need to convert your ground truth so that each it has shape (2, H, W), where each pixel contains [0, 1] (which means it is a 1) or [1, 0] (which means it is a 0). If you use sigmoid and a single class, youâ€™ll have inputs of shape (1, H, W), where each pixel contains either 0 or 1 (which is what you probably have). The output will then contain for each pixel the probability that it is a 1.
Hope I am clear!
If you converge fast to 0, you can consider some options:

Lower learning rate

Clip gradients

Find a loss that penalizes false negative more (weighted cross entropy or dice for instance)

Similarly, donâ€™t use accuracy for binary segmentation, dice or IoU are better indicators.

Youâ€™re absolutely right, that is basically the stage I have gotten to. I am currently digging around trying to discover how to do either of those solutions, so I will ask here.

fastaiâ€™s unet_learner is giving me a softmax activation. How can I change this to be a sigmoid? It looks to me in the code that DynamicUnet has sigmoids, does the learner slap softmax on the end? How would I change this?

Iâ€™m loading my masks like:

class SegLabelListCustom(SegmentationLabelList):
def open(self, fn): return open_mask(fn, div=True, convert_mode=â€śLâ€ť)

class SegItemListCustom(ImageList):
_label_cls, _square_show_res = SegLabelListCustom, False

How/when do I process them into separate channels? Maybe Iâ€™ll try a custom function for open_mask?

Fastai model doesnâ€™t include an activation, but it computes the right number of outputs depending on your number of classes. However, it uses one when calculating metrics, which is dependent on your loss function mainly.
If you really want to process them in 2 channels:

I am currently using the default loss function, which is: FlattenedLoss of CrossEntropyLoss()

I would prefer to use a different loss such as NLLLoss or BCELoss with weights, since my classes are highly imbalanced and cause my model to predict mostly 0s. However I have not been able to get them working because of my truth being a different shape.

Ideally I would like to calculate the weights based on the probability in each batch.

data.train_ds.classes yields ['clean', 'HE'], which I set myself.

keep everything as it is except you change the open function to make masks have 2 channels

use something like BCE (i recommend using BCEWithLogitsLoss, else nothing will ever use an activation) that expects 1-channel input, but change classes to something like HE (with BCE you donâ€™t need a class for background, it expects one channels with values between 0 and 1).
Iâ€™d tend towards the second solution as doing multiclass just to compute background is a bit useless.

Exception: It's not possible to apply those transforms to your dataset:
grid_sampler(): expected input and grid to have same dtype, but input has long and grid has float

I must have misinterpreted it then. Just to clarify, I am doing multi-label segmentation with just the one mask. The mask will have any of the values [0, 1, 2, 3, 4].

Would BCE instead be used for multi-channel where each mask is one hot encoded?

BCE expects input and target masks that have one channel with only 0 and ones. Cross-entropy expects target mask to have one channel with values between 0 and 4 (which is what you have) and input mask (=output of the network) to be one-hot encoded (so 5 channels with values 0 or 1). It seems to me you are exactly in the second case.