What is the relationship between the pixel accuracy and mIoU? ----semantic segmentation

In semantic segmentation task, pixel accuracy and mIoU are two commonly used metric. The most commonly used loss function in semantic segmentation model is cross entropy, which directly acts as a proxy for pixel accuracy. However, in the training, the pixel accuracy is usually very high, the model tends to converge, but the mIoU is low (just like the picture). So what accounts for this difference between pixel accuracy and mIoU?What is the mathematical relationship between the two? And how to solve this problem?

Hi, without any knowledge about your data and model, I suspect that the difference might come from imbalanced classes. Imagine a dataset with two classes, where one class is only 5% of the total pixels and the other (background) class is 95%. Now you could have a model that does nothing at all except predicting background for all pixels. You would still have a 95% pixel accuracy, so it’s obviously not a useful metric in that case :slight_smile:

The mean IoU however is only (0 + 0.95)/2 = 47.5%. For the calculation and a very good explanation, please check Metrics to Evaluate your Semantic Segmentation Model. (It’s on Medium, if you don’t have access, DM me.)

1 Like

Thanks for reply!

Hi, Does that mean mean IoU is always the best metric for semantic segmenation? I was wondering if there exists any scenarios that mean Accuracy is a better metric than mIoU.
In addition, the link you shared needs a membership to access.