In the bounding boxes notes we create 2 output on the bounding box model. One linear and the other one with softmax for the labels classification. After that we scale the mse loss by 0.001 to make it the same size as cross-entropy.
How do we know that we have to scale the mse by 0.001? I would never have guessed.
And why did we use mse in the first place?