Just after today’s lecture, I read the YOLOv3 paper.
In the section Things We Tried That Didn’t Work. They tried the focal loss and found that they dropped a couple of mAP points. They stated the reason as follows
YOLOv3 may already be robust to
the problem focal loss is trying to solve because it has separate
objectness predictions and conditional class predictions
They are not themselves sure about this
I didn’t quite really understand this. Even if YOLOv3 was robust enough the Focal loss should have made it better.
Side Notes: It’s a very good and fun read. The community needs more papers like this. I was literally laughing while reading this paper.
The paper is great, I agree!
And, I agree with the focal loss results for now. I’ve followed somebody else’s implementation in Keras+Tensorflow, but from my initial tests (whole-image binary classification) it fares worse wrt binary crossentropy. Not sure why, honestly, but I’ll try and dig deeper as soon as I can.
So here is what happened in YOLOv3 (or at least what I think happened) with focal loss:
From the focal loss paper, “Class imbalance is addressed in R-CNN-like detectors by a two-stage cascade and sampling heuristics”, which means that the data fed into the classifiers in two-stage detectors are actually way more balanced after the two processing techniques, which includes proposal and sampling heuristics. In this way, the classifier is not affected.
“In contrast”, if we view single-stage detector as an implicit combination of both proposal network and classifier, the classifier part will have to process a ton of background images, whereas its counter part in two-stage detectors does not. So if we introduce focal loss into regular single-stage detectors, we are down weighing the overwhelming number of background for the classifiers.
Then let’s take a look at YOLOv3’s paper (or, as the authors insist, a tech report), which states “If a bounding box prior is not assigned to a ground truth object it incurs no loss for coordinate or class predictions, only objectness.” which means the classifier, during training, essentially ignores the loss produced by the huge amount of background, which is essentially the same as modifying the coefficient in the focal loss term, and which is essentially what the focal loss tries to do, balancing the foreground background contrast. And thus, yeah, YOLOv3 does has its own version of focal loss already implemented. There is no reason to do it again.