This thread is for any discussion about the latest Kaggle competition – Global Wheat Detection Challenge.
It seems there are a number of fellows also playing the competition. I think it’s a good idea to have a thread dedicated to this competition, where we can share ideas/ tips/ useful resources, recruiting teammates, discussing how we could apply fastai2/ fastai on the competition, or troubleshoot bugs (whatever about the competition!).
On my side, I originally attempted to interface torchvision’s FasterRCNN to fastai2, but ended with a so-so performance and the interface is not working pretty well (many ugly tweaks involved). So right now I am trying to use fastai’s RetinaNet instead in the competition. For RetinaNet, I mainly take reference from this notebook from course v3. Though the notebook implements RetinaNet in fastai v1, I think it is a pretty good starting point.
I have spent quite some time understanding how RetinaNet works from this notebook (e.g. the format of bbox in model input and model output, post-processing steps involved on predicted bounding boxes in inference / training mode … etc).
My feeling so far on object detection models (such as FasterRCNN, RetinaNet) is that it involves A LOT OF DETAILS. Missing one of them could probably lead to a poor performance. I would like to share something you had better pay attention below:
- Pay extra attention on the bbox format of your data input v.s. the bbox format your model expect v.s. the bbox format the loss function expect! Taking RetinaNet from the above notebook as an example, its loss function expect the target bbox to have a rescaled TLBR format (i.e. [y0, x0, y1, x1] AND each numbers are rescaled to [-1, +1])! If the bbox of your data input is in COCO format (i.e. [x0, y0, w, h] and each numbers are not scaled) / PASCAL-VOC format (i.e. [x0, y0, x1, y1] and each numbers are not scaled), make sure you have transform them to the right format before feeding into loss function. Failing to do so will make your training loss curve hardly drop. (I made that mistake before)
- Check if the model count background into the number of class. Take RetinaNet as an example, background is excluded from the number of class. But for other model such as FasterRCNN / YOLO, it count background into the number of class (correct me if I am wrong).
- The mAP metrics the competition is using is DIFFERENT from the standard definition! See the resources below for the implementation of metrics, as a reference.
Resources I Found Useful
- [fastai] course v3 RetinaNet implementation
- [fastai] object detection using fastai2
- [kaggle kernel] an implementation of mAP metrics for this competition
- [kaggle kernel] training a keras RetinaNet
Do feel free to share your tips / resources, I am happy to consolidate them all here!