I’d like to understand the reasoning behind the choice of loss and activation function for the object detection model in lesson 1 part 2:
We are doing regression to predict the coordinates of the bounding boxes and classification to predict the class of the largest object.
The custom head we defined ended with a Linear layer. In the custom loss function, we take the output of the model, split it into the predictions of the coordinates and the categories and put the predictions of the classes into the cross_entropy loss function.
Isn’t the cross_entropy loss function designed to give fast training assuming we use the sigmoid activation function? But didn’t we use the “linear activation function” in the end for the prediction of the classes which means we should have used MSE?
I’m interested what you think about this!