Could a convolutional neural network learn to count between 1 and say 4 dogs on an image in theory? Because my understanding is that they are good at recognizing the features of a dog but not at counting how many there are.

I am interested in this same question, and I am actually planning to run some experiments soon, to see if NNs can learn to count objects (but I will use synthetics images though, not real ones). I also found this interesting article about the topic https://arxiv.org/abs/1807.09856

If you just want a quick â€śprototypeâ€ť I think you can use the code from the object localization with bounding boxes lecture with an appropriate dataset and add something that counts the number of bounding boxes labeled â€śdogâ€ť in the output after inference. There is a predefined maximum of bounding boxes but that can be adjusted if needed.

There are far more professional (and more complicated) solutions / architectures for this though, maybe checkout some of these:

Thanks, yes I see there are method for doing this. What I was more trying to understand is if you can do this with a regular CNN architecture and get the intuition on why you can or why you canâ€™t.