By the way I think for segmentation it’s probably easier (and more correct) to use the “DICE” metric, which is actually designed to aggregate accuracy across the whole image.

DICE already takes the input tensor shape into account, so you can use the vanilla metric

