Special adjustments when combining different losses(mseloss, bceloss, celoss)

heisenburgzero · October 23, 2018, 10:06am

Since sigmoided values are bound between 0 to 1, and mselosses are usually larger than that. I’m assuming the loss values tends to be unbalanced. The question is, do we need to make special adjustments (weight balancing) when adding these different type of losses together? If there is, is there any general rules of thumb?

This is a example of what I’m doing:

loss_x = self.lambda_xy * self.mseloss(preds_xy[..., 0] * obj_mask, tx * obj_mask) / nB
loss_y = self.lambda_xy * self.mseloss(preds_xy[..., 1] * obj_mask, ty * obj_mask) / nB
loss_w = self.lambda_wh * self.mseloss(preds_wh[..., 0] * obj_mask, tw * obj_mask) / nB
loss_h = self.lambda_wh * self.mseloss(preds_wh[..., 1] * obj_mask, th * obj_mask) / nB
    
loss_conf = self.lambda_conf * \
            ( self.obj_scale * self.bceloss(preds_conf * obj_mask, obj_mask) + \
              self.noobj_scale * self.bceloss(preds_conf * noobj_mask, noobj_mask * 0) ) / nB
loss_cls = self.lambda_cls * self.bceloss(preds_cls[cls_mask], tcls[cls_mask]) / nB
loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls

loss_x, loss_y, loss_w, loss_h is using squared-sum error.
loss_conf, loss_cls is using binary logistic-loss error
And my loss function is just adding them together.

I’m asking this because I have been trying to re-implement training part of YoloV3 at https://github.com/ydixon/yolo_v3. But my model is not converging so well on the COCO dataset. I’ve been trying to determine whether the problem is with the loss function/augmentations/hyperparameters.

wyquek · October 23, 2018, 10:46am

You could, but most likely you’ll need to run a debugger to check if the separate losses are of different scale. Check here in lesson 9. But I don’t see any scaling carried out by the author in the codes from the link you provided.

heisenburgzero · October 23, 2018, 12:46pm

I’m the author of that repo actually :). I’ve set up the lambda / (obj/noobj)scale values so I can tinker around with different parameters although they are all value of 1s right now. Thanks for the lesson 9 link. I guess one strategy would be just looking at the values and try to make good guesses.

wyquek · October 24, 2018, 6:42am

btw, wonder if you know whether Joseph Redmon scaled his losses? I’ll be curious how he did train if he didn’t.

I’m not sure, but does this snippet from here show that he scaled as well?

        loss_x = self.coord_scale * nn.MSELoss(size_average=False)(x*coord_mask, tx*coord_mask)/2.0
        loss_y = self.coord_scale * nn.MSELoss(size_average=False)(y*coord_mask, ty*coord_mask)/2.0
        loss_w = self.coord_scale * nn.MSELoss(size_average=False)(w*coord_mask, tw*coord_mask)/2.0
        loss_h = self.coord_scale * nn.MSELoss(size_average=False)(h*coord_mask, th*coord_mask)/2.0
        loss_conf = nn.MSELoss(size_average=False)(conf*conf_mask, tconf*conf_mask)/2.0
        loss_cls = self.class_scale * nn.CrossEntropyLoss(size_average=False)(cls, tcls)
loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls

heisenburgzero · October 24, 2018, 7:33am

I think scales are removed (or all set to 1s) when moving from YoloV2 to YoloV3. I’m going to start studying the C code (lack of practice) once I have things set up, but I figured I should try troubleshooting the problem before comparing it with the actual answer itself.

wyquek · October 26, 2018, 8:48am

Could it be that maybe Focal Loss got rid of the need to scale the different losses in YoloV3. ?
Edit: Nay, that’s for binary cross-entropy losses. Pls ignore