A Parallel Implementation of Computing Mean Average Precision

Hi all, I have implemented mean average precision (mAP) in a CUDA-compatible way. It can now be plugged into a training loop to test a detector’s mAP on the validation set after each training epoch just as how accuracy is tested for image classifiers. Existing ways of testing mAP operate sequentially and are only done after the training process is finished. They require inference to be run on one image at a time and the detected bounding boxes need to be transferred to CPU or disk before mAP can be calculated. This is too inefficient and inconvenient to show how a detector improves on mAP after each training epoch. My implementation allows inference to be run on a mini-batch of images. True positives and false positives for the mini-batch are determined on GPU. mAP will be immediately available once inference is finished.

Currently, my implementation only supports Pascal VOC style mAP (i.e. a single IoU threshold), but it can be extended to support COCO style mAP (i.e. a list of IoU thresholds) with broadcasting. My laptop is too slow to work with COCO so I haven’t done that.

My implementation may require a slight change in how your dataloader and postprocessing work. The general rule is to ensure everything in a mini-batch has the same shape. This can be done by padding dummy values and using binary masks to keep track of real values. I have a complete example with a demo notebook at GitHub - bwangca/fast-map: A PyTorch implementation of computing mean average precision in parallel. Detailed explanations can also be found in my paper at [2206.09504] A Parallel Implementation of Computing Mean Average Precision.