# Special adjustments when combining different losses(mseloss, bceloss, celoss)

Since sigmoided values are bound between 0 to 1, and mselosses are usually larger than that. I’m assuming the loss values tends to be unbalanced. The question is, do we need to make special adjustments (weight balancing) when adding these different type of losses together? If there is, is there any general rules of thumb?

This is a example of what I’m doing:

``````loss_x = self.lambda_xy * self.mseloss(preds_xy[..., 0] * obj_mask, tx * obj_mask) / nB
loss_y = self.lambda_xy * self.mseloss(preds_xy[..., 1] * obj_mask, ty * obj_mask) / nB
loss_w = self.lambda_wh * self.mseloss(preds_wh[..., 0] * obj_mask, tw * obj_mask) / nB
loss_h = self.lambda_wh * self.mseloss(preds_wh[..., 1] * obj_mask, th * obj_mask) / nB

loss_conf = self.lambda_conf * \
loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls
``````

loss_x, loss_y, loss_w, loss_h is using squared-sum error.
loss_conf, loss_cls is using binary logistic-loss error
And my loss function is just adding them together.

I’m asking this because I have been trying to re-implement training part of YoloV3 at https://github.com/ydixon/yolo_v3. But my model is not converging so well on the COCO dataset. I’ve been trying to determine whether the problem is with the loss function/augmentations/hyperparameters.

You could, but most likely you’ll need to run a debugger to check if the separate losses are of different scale. Check here in lesson 9. But I don’t see any scaling carried out by the author in the codes from the link you provided.

I’m the author of that repo actually :). I’ve set up the lambda / (obj/noobj)scale values so I can tinker around with different parameters although they are all value of 1s right now. Thanks for the lesson 9 link. I guess one strategy would be just looking at the values and try to make good guesses. btw, wonder if you know whether Joseph Redmon scaled his losses? I’ll be curious how he did train if he didn’t.

I’m not sure, but does this snippet from here show that he scaled as well?

``````        loss_x = self.coord_scale * nn.MSELoss(size_average=False)(x*coord_mask, tx*coord_mask)/2.0