In semantic segmentation task, pixel accuracy and mIoU are two commonly used metric. The most commonly used loss function in semantic segmentation model is cross entropy, which directly acts as a proxy for pixel accuracy. However, in the training, the pixel accuracy is usually very high, the model tends to converge, but the mIoU is low (just like the picture). So what accounts for this difference between pixel accuracy and mIoU?What is the mathematical relationship between the two? And how to solve this problem?
Hi, without any knowledge about your data and model, I suspect that the difference might come from imbalanced classes. Imagine a dataset with two classes, where one class is only 5% of the total pixels and the other (background) class is 95%. Now you could have a model that does nothing at all except predicting background
for all pixels. You would still have a 95% pixel accuracy, so it’s obviously not a useful metric in that case
The mean IoU however is only (0 + 0.95)/2
= 47.5%. For the calculation and a very good explanation, please check Metrics to Evaluate your Semantic Segmentation Model. (It’s on Medium, if you don’t have access, DM me.)
Thanks for reply!
Hi, Does that mean mean IoU is always the best metric for semantic segmenation? I was wondering if there exists any scenarios that mean Accuracy is a better metric than mIoU.
In addition, the link you shared needs a membership to access.
Hello,
I am interested in learning more about metrics to evaluate my segmentation model and planned to read your article in Medium. How do I get access to it, do I need to subscribe medium first?
Thanks in advance!