[Lesson 7] Accuracy improved with bounding boxes?

In the lesson 7 notebook
@jeremy states:

Excitingly, it turned out that the classification model is much improved by giving it this additional task. Let’s see how well the bounding box model did by taking a look at its output.

But I don’t see any improvements on the validation accuracy which reached 0.9820 and previously it was 0.9900 with the multi input model. So I’m a bit confused, do you actually compare the resulting accuracy of only the 3 epochs ran each time you create a new model? How do you see the accuracy of the model being improved exactly? What metrics do you refers to?

