You take the calculation of precision separately for each class. So in your example:

- given 5 predictions (3 for class one, 2 for class two) and a ground truth of 1 object in class one (and assuming none in class two)
- for class one, you have 1 TP and 2 FP
- for class two, you have 0 TP and 2 FP

The calculation of precision (TP/(TP+FP)) would give:

- for class one: 1/3
- for class two: 0/2

The gap in my understanding that I haven’t fully filled yet is that average precision is different than precision.

While precision is defined as TP/(TP+FP), average precision is related to the (approximate) area under the precision-recall curve for each class. Like this (fig from here):

Found some good overviews of the difference here:

**Best overview imo**: https://medium.com/@timothycarlen/understanding-the-map-evaluation-metric-for-object-detection-a07fe6962cf3

https://sanchom.wordpress.com/tag/average-precision/

The mAP is then calculated as the mean of APs (quite a literal definition: the mean of the average precisions per class). You can also examine the AP for each class, like this (from here):

And in object detection papers, you often see something like mAP@[.5:.95] which denotes an average of the mAPs across different IoU thresholds for considering if a prediction counts as a TP. More on this here: https://datascience.stackexchange.com/questions/16797/what-does-the-notation-map-5-95-mean

What a confusing phrase to say out loud: “average of the mean average precisions”

I’ve been working on fully understanding and implementing mAP as well. It’s more complex and ambiguous than I expected. Perhaps we should start a new thread on this?

Btw, here are some github references I’ve been using to check my implementation:

https://gist.github.com/tarlen5/008809c3decf19313de216b9208f3734

https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py

https://github.com/amdegroot/ssd.pytorch/blob/master/eval.py