Special adjustments when combining different losses(mseloss, bceloss, celoss)

Since sigmoided values are bound between 0 to 1, and mselosses are usually larger than that. I’m assuming the loss values tends to be unbalanced. The question is, do we need to make special adjustments (weight balancing) when adding these different type of losses together? If there is, is there any general rules of thumb?

This is a example of what I’m doing:

loss_x = self.lambda_xy * self.mseloss(preds_xy[..., 0] * obj_mask, tx * obj_mask) / nB
loss_y = self.lambda_xy * self.mseloss(preds_xy[..., 1] * obj_mask, ty * obj_mask) / nB
loss_w = self.lambda_wh * self.mseloss(preds_wh[..., 0] * obj_mask, tw * obj_mask) / nB
loss_h = self.lambda_wh * self.mseloss(preds_wh[..., 1] * obj_mask, th * obj_mask) / nB
loss_conf = self.lambda_conf * \
            ( self.obj_scale * self.bceloss(preds_conf * obj_mask, obj_mask) + \
              self.noobj_scale * self.bceloss(preds_conf * noobj_mask, noobj_mask * 0) ) / nB
loss_cls = self.lambda_cls * self.bceloss(preds_cls[cls_mask], tcls[cls_mask]) / nB
loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls 

loss_x, loss_y, loss_w, loss_h is using squared-sum error.
loss_conf, loss_cls is using binary logistic-loss error
And my loss function is just adding them together.

I’m asking this because I have been trying to re-implement training part of YoloV3 at https://github.com/ydixon/yolo_v3. But my model is not converging so well on the COCO dataset. I’ve been trying to determine whether the problem is with the loss function/augmentations/hyperparameters.

You could, but most likely you’ll need to run a debugger to check if the separate losses are of different scale. Check here in lesson 9. But I don’t see any scaling carried out by the author in the codes from the link you provided.

I’m the author of that repo actually :). I’ve set up the lambda / (obj/noobj)scale values so I can tinker around with different parameters although they are all value of 1s right now. Thanks for the lesson 9 link. I guess one strategy would be just looking at the values and try to make good guesses.:thinking:

btw, wonder if you know whether Joseph Redmon scaled his losses? I’ll be curious how he did train if he didn’t.

I’m not sure, but does this snippet from here show that he scaled as well?

        loss_x = self.coord_scale * nn.MSELoss(size_average=False)(x*coord_mask, tx*coord_mask)/2.0
        loss_y = self.coord_scale * nn.MSELoss(size_average=False)(y*coord_mask, ty*coord_mask)/2.0
        loss_w = self.coord_scale * nn.MSELoss(size_average=False)(w*coord_mask, tw*coord_mask)/2.0
        loss_h = self.coord_scale * nn.MSELoss(size_average=False)(h*coord_mask, th*coord_mask)/2.0
        loss_conf = nn.MSELoss(size_average=False)(conf*conf_mask, tconf*conf_mask)/2.0
        loss_cls = self.class_scale * nn.CrossEntropyLoss(size_average=False)(cls, tcls)
loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls

I think scales are removed (or all set to 1s) when moving from YoloV2 to YoloV3. I’m going to start studying the C code (lack of practice) once I have things set up, but I figured I should try troubleshooting the problem before comparing it with the actual answer itself.

Could it be that maybe Focal Loss got rid of the need to scale the different losses in YoloV3. ?
Edit: Nay, that’s for binary cross-entropy losses. Pls ignore