Image Segmentation on COCO dataset - summary, questions and suggestions

have you tried using the metric from camvid? im pretty sure thats the classification accuracy per pixel which would be good for what youre doing (and i think its different than the metric youre using).

also, (not sure if this is what you have) the tensors of the labels/masks images you have should be the class of each pixel. so they should probably be all 0’s and 1’s since youre just predicting masks for humans.

i did this for the camvid dataset to turn it into the dataset from the tiramisu paper, so i might be able to help if you run into any problems