UPDATE 4/17: the below fix and discussion is specific to our own notebooks posted above in this thread. The official pascal_multi notebook has the correct implementation and does not need to be changed. I will be updating my notebooks going forward to sync up with the official version. Sorry for any confusion.
Found the issue! (…well, at least one issue :))
@jeremy, your suspicion was correct - our flatten_conv function was not correctly lining up the order of anchor/prediction bboxes with that of the receptive fields.
Anchor/prediction boxes were incrementing by going top–>down each column first and then the next-right column while the receptive fields were going left–> right first and then down to the next row.
The fix is to switch the permute dim-ordering (
x.permute(0,3,2,1) instead of
x.permute(0,2,3,1) so that we are transposing the order of our prediction boxes as we flatten our outbound convolutions:
bs,nf,gx,gy = x.size()
x = x.permute(0,3,2,1).contiguous()
Running my baseline pascal-multi notebook, this improved mAP from 30.4% to 32.4%
In my best performing FPN variant so far, the fix improved mAP from 31% to 35.7%! Notebook link coming soon…
On visual inspection, the effect of the bug is obvious (but only in retrospect…). I was seeing a lot of weird localization errors like this:
After the fix:
No more sheep in the trees!
This bug has the greatest effect where gt objects are clustered to the bottom left or top right and our prediction boxes are transposed to the other side of the diagonal. The prediction bboxes still tried to make their way towards maximum IoU with ground truth but there was only so far they could go due to the center and height/weight constraints we set.
It wasn’t that obvious by just comparing the average localization and classification loss values:
loc: 1.8269546031951904, clas: 3.6849770545959473
loc: 1.8288934230804443, clas: 3.7000365257263184
Now we’re in business! I’m sure there are still issues/tweaks to be made on the FPN side of things so I’m looking forward to seeing how high we can push the mAP.