Let’s say I have trained an image recogniser to classify the content of an image as either a dog, a cat, a rat or an elephant. I put my model into production and it is for some reason fed with images of humans. The probabilities from the model suggests that there is 70% probability that an image of a human is an image of an elephant. I would rather like the model to say the input is not like any of the classes it was trained on.
What is the go-to-way of handling this problem? I don’t have resources to train my model on data from all the possible classes that could show up, and I also don’t care about detecting anything else than what I explicitly trained my model on.
I’m pretty sure Jeremy has talked about this in one of the lectures, but I can’t seem to find it.
You need to have a C class with a random assortment of what you ‘won’t’ have. This is due to our argmax giving us only a range of what it was trained on. Everything is rounded to one so in reality it could be a chicken but to us it will still say Maybe 51% cat, 49% dog (just an example).
Yes, but that would require me to find a very diverse set of samples that does not belong to any of the classes of interest. Which would require a lot of resources. Let’s say for example that I put images of all kinds of animals (except the ones that I want to detect) in my ‘Other’-class, and then the model is fed with an image of a car. It is not obvious to me that the car image would be classified as ‘Other’.
You’re right. Here’s a different idea. Following the planets example do a multi-class, where every image gets a single label, but it also can show up as nothing is there (not a cat or dog). This should help with that (I have not done this before but seems ok)
I don’t have time today to flesh out an example but that should help. Just give them single labels instead of multi but pass them in as multi category.
And then you may need random blank images (this is where I’m unsure and need to find out but)
The other way I think this is done is by applying Bayesian methods (at least in research, think the practical side is still fairly new). This allow you to quantify the certainty of the network predictions, beyond just taking the outputs as predictions of certainty (which as shown in this issue doesn’t always work, the correctness of it’s certainty of ‘predictions’ being based on the inputs you gave it being a true representation of the space of all inputs you want it to operate over).
This can be done either with wholly Bayesian NNs, or by grafting some Bayesian stuff onto a network (I think that “evidence fusion” is a common term for this sort of thing, though technically this is about integrating multiple inaccurate sources of information I think it would work with just one).
For the fully Bayesian stuff, GPyTorch is PyTorch based library for Bayesian stuff (which Facebook is linked with), Gaussian Processes are one of the ways of making Bayesian techniques practically implementable. Or Pyro is I believe another popular option here (and also uses PyTorch as a backend).
Not at all experienced here (and minimal knowledge of the basic Bayesian principles) so can’t give any advice.
But that gets into rather more advanced techniques while muellerzr’s method is nice and simple and should hopefully give reasonable results.
That is an interesting question. You can only presume it will help but interesting to know how well it does without much work there.
Hi Colin thanks for posing a wonderful question.
I have been toying with this thought for a while, as when I have demonstrated classifiers to various parties everyone has been impressed, however when they enter a random image, someone like my mum will say that’s not intelligence (she hasn’t done fast.ai yet ).
Any progress you make will help improve everyone’s work, it would probably make a good feature for fast ai.
I woke up this morning and had an Idea . In lesson 3 we do segmentation using the Camvid dataset/model which I believe recognizes a 1000 different classes of object.
I was just thinking, if you passed your image through this first model or something like it first, then based on the result, you could decide to pass it to your other model or make some other decision.
You may have to check what classes are in the Camvid dataset/model to fine tune it. This would possibly eliminate some classes of picture that your model wouldn’t have to deal with.
You could possibly use the results of both models to tell the user that it looks like an elephant and a a human.
But my problem is that I’m dealing with data that I dont’t think there are any “standard” models for (my animals example was only a metaphore ) that I could use as a pre-trigger, and I think it’s the same for most industry applications.
Hi ColinN like I said in a previous post, I will need to find a way to resolve this problem in the future.
Some possible alternative solutions could be:
Think of it as a validation problem, how to stop incorrect images being entered?
Think of it as as requiring continuous learning and creating a dataset problem - Every time someone uses the model and it makes a wrong classification add it to the “wrong class” class and retrain the model, this can be done in batches over time depends on size etc.
Over time the model should get better.
If the industry has a particular set of incorrect images collect them and make your own data set for the “wrong class” class.
Interesting problem, @ColinN. No one mentioned using a threshold. Depending on your output weight normalization method, you could (naively) apply a suitable threshold like 70% or so to screen out ‘indeterminate’ results.
I’d like to know if you try this and it actually works.
As Jeremy mentionned in lesson 9 part 2 v3, the problem is contained in the way you build your neural network and in the use of the softmax function. the interesting section of the video is at 32:00.
In essence, the network is built to recognize out of the original classes. If you try to add a class with everything else, other than a cat, dog or elephant, the network will have extreme difficulties to determine the features representing everything but a cat, dog or elephant. Additionally, the softmax function isn’t designed to help in this case of external class as explained by Jeremy in the video.
Edit: you shall also look at lesson 10 v2 2019 at 45:00 which is which the section where Jeremy explained the problem in more details.
You can use instance-based method for this. Each image is put through the network to get a vector representation of the image. When you want to classify a new image use something like knn to identify which cluster of images the new image resembles the most. Find the distance metric for some images you don’t want classified and use this metric as your do not classify threshold.
The idea being to send 2+ pictures simultaneously to the network and running a logistic regression to find out if the pictures are from the same or different item. However, at inference time, it requires to run the new picture against a set of known pictures and set a threshold under which the new picture item isn’t recognized. the network is much more difficult to put in place and to train.
Several techniques are available following this whale kaggle competition. This techniques for one shot learning are changing really fast so it may be obsolete really quickly.
I was a bit intimidated to ask how to actually implement it in Lesson 2 code.
So if you know where do I actually make the changes so that the prediction has third value “I don’t know”. I am hoping I would know at the end of Lesson 3
If you have other ideas or a sample code on how to get it to work. that would be awesome