Can some one explain what actually is done by FLAT version of BCE loss and how it output works with standard BCE loss functions.
BCE requires last layer output to be same as size of label .

This may help:

We flatten before sending in our input and target

1 Like

I checked it out… but was trying to understand how standard bce is able to calculate the loss after this transformation.
in a binary classification i suppose fast ai creates output layer with output size as 2 .
but BCE expects size of label which is one to be same in output .
N * 2 vs N (label)