Hi, I wonder if in a multi-label setting, the idea of weighting the losss of different outputs differently makes sense.
To elaborate, say we are predicting 10 classes in our use case and because each observation can have multiple labels associated with it, we opt in for a multi-label structure (instead of a multi-class one). In this case, instead of using a softmax layer in the end, we use 10 sigmoid activations and binary cross entropy loss for each. Even though we have now technically 10 outputs, the model would aggregate them into one by averaging them so we are still optimizing using one loss function in the end. The problem is that some classes may have a significantly large losses than the others, and those with the small losses may be undertrained (such as in a class imbalance case). Hence, I’m wondering if it makes sense to weight the individual losses differently somehow before combining them (like in a multi-ouput model).
To my knowledge, there is no out-of-box way to do it, but I can see how to do it by defining a custom loss function. I just wonder if anyone has tried that and what your experiences with it are.