I was learning the benefits of using multi outputs neural network and as you all know each outputs have it’s own loss function which get summed to a single output loss which need to be optimized.
Lets say our neural network has 2 outputs, one of these 2 outputs is a regression with a MSE loss which typical values can range from around 3-5. The second output is a categorical classification with a crossentropy loss which value could get as low as 0.1 (the example was taken from F.Chollet book). In Keras you could weight the loss to put them on the same scale. So here I have some interrogations:
- What is the mathematical formula to find the weights to put the loss on the same scale?
- Why couldn’t we just normalize all the loss and sum them up? (with a mean of 0 and a std of 1?)
- How is the neural network supposed to know which loss contributed to the error if they get all summed together? The resulting output being a single value, how, during the backprop, the neural net knows how this or that output contributed to the “global” error?