Hello,
In the introductory fastai lessons, the “error_rate” metric is used to track the progress of our single-class classification model. For semantic segmentation problems, the most commonly used metric to evaluate the progress of model training is the “Intersection over Union” value (IoU), which is also referred to as the “Jaccard index”.
Within the fastai library, in the fastai.metrics module, I have identified two functions that may provide metrics for image segmentation: foreground_acc and dice. The iou=True kwarg may be passed to the dice function to replicate the IoU metric, but only for binary targets. This does not work for an image segmentation problem where multiple classes are identified, such as in the CamVid dataset used in the fastai course Lesson 3: Fast AI Lesson 3 with Semantic Segmentation. The foreground_acc function is identical to the acc_camvid function that Jeremy uses in the video to compute accuracy on the CamVid dataset specifically.
My question is, is the acc_camvid accuracy function generalizable to all multi-class image segmentation problems? If I desire the IoU metric instead (to benchmark against examples provided in research papers), how can I compute the IoU metric for multi-class image segmentation?
Before coming here, I first stumbled upon the following jaccard_loss function, provided at pytorch-goodies.
def jaccard_loss(true, logits, eps=1e-7):
"""Computes the Jaccard loss, a.k.a the IoU loss.
Note that PyTorch optimizers minimize a loss. In this
case, we would like to maximize the jaccard loss so we
return the negated jaccard loss.
Args:
true: a tensor of shape [B, H, W] or [B, 1, H, W].
logits: a tensor of shape [B, C, H, W]. Corresponds to
the raw output or logits of the model.
eps: added to the denominator for numerical stability.
Returns:
jacc_loss: the Jaccard loss.
"""
num_classes = logits.shape[1]
if num_classes == 1:
true_1_hot = torch.eye(num_classes + 1)[true.squeeze(1)]
true_1_hot = true_1_hot.permute(0, 3, 1, 2).float()
true_1_hot_f = true_1_hot[:, 0:1, :, :]
true_1_hot_s = true_1_hot[:, 1:2, :, :]
true_1_hot = torch.cat([true_1_hot_s, true_1_hot_f], dim=1)
pos_prob = torch.sigmoid(logits)
neg_prob = 1 - pos_prob
probas = torch.cat([pos_prob, neg_prob], dim=1)
else:
true_1_hot = torch.eye(num_classes)[true.squeeze(1)]
true_1_hot = true_1_hot.permute(0, 3, 1, 2).float()
probas = F.softmax(logits, dim=1)
true_1_hot = true_1_hot.type(logits.type())
dims = (0,) + tuple(range(2, true.ndimension()))
intersection = torch.sum(probas * true_1_hot, dims)
cardinality = torch.sum(probas + true_1_hot, dims)
union = cardinality - intersection
jacc_loss = (intersection / (union + eps)).mean()
return (1 - jacc_loss)
Where the true and logits parameters correspond to the ground truth image and the output of the Learner model, respectively.
Having the function return jacc_loss instead of 1-jacc_loss, this should theoretically give the IoU value of the current prediction. Can anyone back me up on this?
When I use the above stated jaccard_loss function, I am getting values of jacc_loss = 0.18 at the same time I am getting acc_camvid = 0.92. This does not make sense to me, as 0.92 for camvid accuracy is state of the art, while IoU state of the art is closer to 0.64. (see Refine-net )
TL;DR I am looking for a way to properly, and generically, report the accuracy of a multi-class image segmentation model.