I am implementing the object detection network in this paper: PIXOR
They have a multitask head for object classification and object localization. Their loss function is:
Loss = focal_loss(p, y_class) + smooth_l1_loss(q - y_reg) [p: network confidence, y_class: object or not; q: network bounding box prediction, y_reg: actual bounding box]
I have been training with this loss function and here is my training loss plot:
what I am concerned about is that the classification loss falls to near 0 very fast whereas the localization loss does not. Should I be doing any kind of loss balancing over here? Are there certain things to be careful of in multitask learning?