all the datasets / competitions I have seen so far have definite categories. There are cats/dogs, 120 dog breeds, etc. But for real life applications there is a good chance that during production you have to classify images where none of your categories is applicable. For instance, in @jeremy 's dogs vs. cats example a picture of a fence was classified as dog.
This is an unsatisfying situation. So my question is: How to handle this kind of non-matching situations? I can think of two ways:
1.) Based on the model’s probability, return a “not sure” if prob < 0.9 (or other limit)
2.) Train a specific category that is orthogonal to your real categories.
In the latter case, how would such an “anything else” category look like?
Check out this post: https://medium.com/@timanglade/how-hbos-silicon-valley-built-not-hotdog-with-mobile-tensorflow-keras-react-native-ef03260747f3
“The final composition of our dataset was 150k images, of which only 3k were hotdogs: there are only so many hotdogs you can look at, but there are many not hotdogs to look at. The 49:1 imbalance was dealt with by saying a Keras class weight of 49:1 in favor of hotdogs. Of the remaining 147k images, most were of food, with just 3k photos of non-food items, to help the network generalize a bit more and not get tricked into seeing a hotdog if presented with an image of a human in a red outfit.”
Thanks a lot. This is a really cool and helpful write up from a practical perspective.
In an extreme case, you could create a “Other” category which comprises all the 1000 categories of Imagenet (or 999 if what you are trying to classify is already a class in Imagenet).