This question is a very natural one to come up after Lesson 1. It and its variations have been brought up many, many times on these forums, with many proposed solutions. I think the issues involved run deep into the limitations, expectations, and even the philosophy of machine learning. If anyone is able to gather and unpack all the angles, it would make a good article, and serve as a definitive response to questions.
Here’s something to ponder. How do you know “for sure” that an image is neither a cat nor a dog? Is it only because you have already seen trillions of scenes with things that you know for sure are not cats or dogs? Maybe your brain has even formed the concept of “animal”. A neural net has only ever seen blobs of pixels sorted into two category labels, without context, without objects, without meaning.
Here’s an experiment. Edit a dog image so it is separated into three heads in one area and four tails in another. Does your trained model tell you it’s certainly, definitely a dog, the doggiest dog it has ever seen?
As for the investigating some of the suggestions above, you can analyze the activations before they are normalized by softmax and sent to cross-entropy loss. Or you can treat dogs vs cats as a multi-label task, train with a sigmoid activation, and apply a probability threshold. These approaches will be explained in later lessons. But I doubt any of them will give you what you are hoping for.
In any case, I don’t wish to discourage you from asking great questions. I think it’s the innocent, deep questions that ultimately move the whole field forward.