In the lessons when we’re trying to identify, Cat’s, Dogs - why don’t we have a third category - “neither”, or “don’t know”?
As it stands, we’ll choose Cats or Dogs, even when our confidence in the prediction is low?
Looking at it from a human perspective - we’ll always have these grey areas where we ourselves, will sometimes not be sure, if the item being inferred is a Cat or a Dog, and sometimes, we’ll know for sure its neither
Hello @zulu_m , the confidence is precisely what you use for “neither”. This logic comes after the model prediction has been made and is also where domain expertise comes into play. You are in full control of how you process the results :).
It would be interesting to see how good the results would be when using three categories:
Cats, dogs, “don’t know” with a loss function that has lower loss for “don’t know” than a low confidence dog/cat. I suspect more people would do it this way though if it was better.
I haven’t really tried this one, but have found this notebook of @muellerzr that might hint you to some extent. The most simple thing you could do is to tweak the threshold for prediction (say, a model is confident on neither of given categories for more than 50% probability, we’d say None). @Hadus, problem with “don’t know” category is you can’t really define it, it could comprise whole world
This question is a very natural one to come up after Lesson 1. It and its variations have been brought up many, many times on these forums, with many proposed solutions. I think the issues involved run deep into the limitations, expectations, and even the philosophy of machine learning. If anyone is able to gather and unpack all the angles, it would make a good article, and serve as a definitive response to questions.
Here’s something to ponder. How do you know “for sure” that an image is neither a cat nor a dog? Is it only because you have already seen trillions of scenes with things that you know for sure are not cats or dogs? Maybe your brain has even formed the concept of “animal”. A neural net has only ever seen blobs of pixels sorted into two category labels, without context, without objects, without meaning.
Here’s an experiment. Edit a dog image so it is separated into three heads in one area and four tails in another. Does your trained model tell you it’s certainly, definitely a dog, the doggiest dog it has ever seen?
As for the investigating some of the suggestions above, you can analyze the activations before they are normalized by softmax and sent to cross-entropy loss. Or you can treat dogs vs cats as a multi-label task, train with a sigmoid activation, and apply a probability threshold. These approaches will be explained in later lessons. But I doubt any of them will give you what you are hoping for.
In any case, I don’t wish to discourage you from asking great questions. I think it’s the innocent, deep questions that ultimately move the whole field forward.
You might want to take a look at Bayesian networks (I haven’t yet myself, it’s in my todo list), which are aiming at helping to estimate the confidence of the answers.
To get a baseline I would try the simple approach proposed by Jeremy in Lesson 9 (use multi-category classification with a sigmoid loss function (
MultiCategoryBlock with BCEWithLogitsLossFlat) instead of softmax (
CategoryBlock with CrossEntropyLossFlat).
This is discussed in Handle data that belongs to classes not seen in training or testing.
You should look at MC Dropout: MC Dropout and practical ideas for it. It is super easy to implement, you just let dropout enabled at inference time and do several predictons.
This would give you a distribution of predictions. So you can then see how confident in its prediction the neural network is. Then you can decide on a threshold of confidence where you output “Don’t know”.
I would also recommend thiis video from Vincent Warmerdam How to Constrain Artificial Stupidity which cover this type of thing.