Fine tuning an object detector problem

So recently in order to solve a counting problem, i decided to take the count-by-detection approach. In doing so, because i was interested in counting people in a dense area, i decided to fine tune an object detector to just detect the heads. In short, besides the background class, there is only 1 class which is a ‘person’.

Summary : Fine tuned an object detector on only 1 class

Soon i evaluated the model after around 180k+ steps. After evaluating, i found a pretty satisfying performance (0.79 mAP on test set) . However, soon after i found at some images, mainly an image of 2 dogs, the standard one you can find on the off-the-shelf notebook on tensorflow object detection api, the model i have detected the dogs’ heads as human head, and in turn, classifies them as ‘person’ with a really high confidence.

The resulting model detected the wrong object with high confidence

Could it be the downsides of having only 1 class?

How would i solve this? By resorting to multiclass, by adding more negative samples , or does my model simply need more training steps?

Thank you!