How pytorch calculates bce loss if output is bs,nc,h,w t. Mask is zero or 1
I obtained different results when I do
X= -Mask log( sigmoid (output) -(1-mask) log(1-sigmoid(1-output) )
here it start from 3k +
And when I use f.bce with logits range of loss 0.5 to 1. F.bce yields to no learning while above one does .
Why is there a difference in output.