Chapter 6: mAP instead of multi_accuracy for PASCAL_2007

weidenwalker · March 17, 2021, 3:37pm

Hi everyone!

I was working through Chapter 6 and the PASCAL 2007 multi-label classification problem this afternoon and noticed some quirks with accuracy_multi and am struggling to implement the fix correctly.

So, the resnet that’s trained in the book gets to a great-sounding 95% accuracy within 3 epochs. I put on my sceptical hat and tried a naive baseline model that just outputs 0 for every label - which gets a similarly impressive 92% accuracy (see my Colab notebook). This suggests that accuracy might not be the most appropriate metric for this problem.

I then checked the original instruction set on the dataset’s website (see section 3.4), as well as the benchmarks on papers-with-code (link). Both of these recommend using mean average precision (mAP) to evaluate model performance instead. However, when I plug in APScoreMulti instead, I get an exception:
RuntimeError: Boolean value of Tensor with more than one value is ambiguous. I’ve had a quick skim over the docs (including the section at scikit-learn), but couldn’t figure out why this is unhappy.

So I guess two issues here:

Does anybody have advice on how to get APScoreMulti to work on this problem?
Does the chapter want be fixed to use mean average precision instead? I realise that this might be intentional to avoid confusion over precision/recall at that stage of the course, but if not, I’m happy to raise an issue on GitHub.

Thanks a lot for your time!

weidenwalker · March 21, 2021, 10:09am

For posterity, I fixed my problem by calling APScoreMulti before passing it to the learner, i.e.:
metrics=[APScoreMulti()] instead of metrics=[APScoreMulti].

The performance as measured by this metric is indeed quite a bit worse than was apparent with accuracy_multi (SOTA seems to be AP of ~95%, source)

06_pascal_multicat_ipynb_-_Colaboratory