I am trying to segment out two types of cells from images. Should my output maps be 2 channels (2 cell types) or 3 channels (2 cell types + background)? Any help, pointers, ideas would be greatly appreciated.
UNet typically applies a softmax to each pixel in the output image, so if you want to be able to tell apart 3 things (background, cell 1, cell 2) then your output should have 3 channels indeed.
(And so should your ground-truth masks. Or alternatively, the ground-truth masks can have just the class indices, with 0 being the background class.)
You can use sigmoid on 3 classes (as in multi label classification) rather than softmax on 1 class.