Let us create a semantic segmentation model(LinkNet) by PyTorch

I implement LinkNet and write a blog post about it, the results looks great but I guess there are something wrong with my codes, because the IoU is out-perform the paper.

From left to right is original image, ground truth, predict result.

Questions related to LinkNet


Excellent article and code !

1 Like

Results of the paper

building    tree    sky     car    sign    road    pedestrian   fence   pole   sidewalk   bicycle   IoU      iIoU
88.8         85.3   92.8    77.6   41.7    96.8    57.0         57.8    37.8   88.4       27.2      68.3     55.8

The paper say they ignore undefined category(0,0,0) when training, I do not know how they do that, what I do is

1 : treat all of the color do not exist in the 11 categories as void(0,0,0)
2 : train with 12 categories(with void)

Output of the paper do not contain any void either, I wonder what kind of pre-processing or post-processing they do

I invite you to watch lesson 14 video where Jeremy explains a similar problem in the way the accuracy was calculated in the tiramisu arxiv paper: they remove the void when calculating iou/accuracy in order to inflate the results.

Also, as per my experiments, it sounds like shuffling the images in trn/val set inflates the results too as the pictures are from videos which creates a leakage.

Finally, following your post, i tried to compare the comparison table for Linknet and Tiramisu among other algos: same algos are reported with different iou numbers in both papers… did I misread?

Thanks, I watch the video, but they are doing this when testing, the paper say they ignore void when training, maybe they are doing the same thing after all.

Which papers and algo? If you mean my results are different with the LinkNet paper, you are not misread, that is why I add the explanation at the end on my blog(I add two more reasons, include overfit and data leakage as you mention)

I think this maybe the main reason, similarity of camvid are high.

Global accuracy of mine is around 87%~88%(train with input size of 512 x 512, random crop from original image, without removing void), this lead to mean iou of 78.9.

According to the tiramisu paper, even they got 91.5 global accuracy, their mean iou only got 66.9, I think there must be something wrong about my implementation.