Why bce loss gives diff output

How pytorch calculates bce loss if output is bs,nc,h,w t. Mask is zero or 1
I obtained different results when I do
X= -Mask log( sigmoid (output) -(1-mask) log(1-sigmoid(1-output) )
X.mean(0).sum() ,
here it start from 3k +

And when I use f.bce with logits range of loss 0.5 to 1. F.bce yields to no learning while above one does .

Why is there a difference in output.

Hi jaideep,
If you look at the pytorch documentation of BCEWithLogitsLoss, you’ll see that the default reduction method is mean, and with that the sum of the output will be divided by the number of elements in the output, while your code only averages over the batch dimension, but not all the elements.

Hope this helps.