I was working through Chapter 6 and the PASCAL 2007 multi-label classification problem this afternoon and noticed some quirks with
accuracy_multi and am struggling to implement the fix correctly.
So, the resnet that’s trained in the book gets to a great-sounding 95% accuracy within 3 epochs. I put on my sceptical hat and tried a naive baseline model that just outputs 0 for every label - which gets a similarly impressive 92% accuracy (see my Colab notebook). This suggests that accuracy might not be the most appropriate metric for this problem.
I then checked the original instruction set on the dataset’s website (see section 3.4), as well as the benchmarks on papers-with-code (link). Both of these recommend using mean average precision (mAP) to evaluate model performance instead. However, when I plug in APScoreMulti instead, I get an exception:
RuntimeError: Boolean value of Tensor with more than one value is ambiguous. I’ve had a quick skim over the docs (including the section at scikit-learn), but couldn’t figure out why this is unhappy.
So I guess two issues here:
- Does anybody have advice on how to get APScoreMulti to work on this problem?
- Does the chapter want be fixed to use mean average precision instead? I realise that this might be intentional to avoid confusion over precision/recall at that stage of the course, but if not, I’m happy to raise an issue on GitHub.
Thanks a lot for your time!