Ah, thanks a lot! I was sure he talked about this in part 1, so I didn’t even bother looking for it in part 2.
You can use instance-based method for this. Each image is put through the network to get a vector representation of the image. When you want to classify a new image use something like knn to identify which cluster of images the new image resembles the most. Find the distance metric for some images you don’t want classified and use this metric as your do not classify threshold.
Hi cactechuser, eagle4, maral, hope your all well!
Every answer helps us create a better solution.
cheers mrfabulous1
This sounds like zero-shot problem isn’t it?
yes, zero-shot can be used in this case.
The idea being to send 2+ pictures simultaneously to the network and running a logistic regression to find out if the pictures are from the same or different item. However, at inference time, it requires to run the new picture against a set of known pictures and set a threshold under which the new picture item isn’t recognized. the network is much more difficult to put in place and to train.
Several techniques are available following this whale kaggle competition. This techniques for one shot learning are changing really fast so it may be obsolete really quickly.
Would the solution work?
mrfabulous1
I was a bit intimidated to ask how to actually implement it in Lesson 2 code.
So if you know where do I actually make the changes so that the prediction has third value “I don’t know”. I am hoping I would know at the end of Lesson 3
If you have other ideas or a sample code on how to get it to work. that would be awesome
You’d want to use lesson 3 Planets code for the sigmoid, as with planets we include an option of if “No labels” are present (blank). And just instead of multi-labeled it’s just single labeled.
I would be careful with the approach of trying to learn a label that represents “this image has none of the other classes”, as that is probably a very hard thing to learn, especially with a large number of classes.
I read through the suggestions in a few threads, and agree that multiclassification is the best solution to this problem. Here is how you do it, in code:
Add a label_delim
parameter, even if there is only one label:
data = (ImageList.from_csv(path, "cleaned.csv") .split_by_rand_pct(valid_pct=0.2, seed=77) .label_from_df(cols=1, **label_delim=' '**) .transform(get_transforms(),size=224) .databunch(bs=64) .normalize(imagenet_stats) ); data
Change accuracy metrics to include a threshold:
learn = cnn_learner(data, models.resnet34, metrics=[partial(accuracy_thresh,thresh=0.95),])
For inference, supply a threshold parameter. If the probability of class is below the threshold, there will be an empty prediction:
prediction,_,probabilities = learn.predict(test_img, **thresh=0.95**)
HTH
mrfabulous1
There’s been some theoretical work in this area under the name of open set recognition and open world recognition.
Open-set recognition(introduced formally in here) is a topic in Computer Vision whose goal is to train a model which is able to recognize a set of known classes with high accuracy and is also able to recognize when an unknown class is shown to the model.
Open-world recognition(introduced formally here) is an extension to open-set recognition and aims to recognize unknown classes, as well as help label them and then retrain the original model to recognize the new classes.
A recent survey on open-set recognition lists common approaches and discusses future directions for the topic.
One basic way to do it is to train a one-class SVM which recognizes each individual class and the ensemble of SVMs predicts whether the class is known or unknown
Hi gessha hope you had a wonderful day!
Thank you for a wonderfully informative post, the first link is a wonderfully clear explanation and the other papers really clarify the theme of this thread.
Many thanks mrfabulous1
@mrfabulous1 @gessha have either of you tried any of the methods in Recent Advances in Open Set Recognition: ASurvey or Bayesian deep learning with Fastai to solve this problem.
I have been playing with the pets data set using multi-category classification with a sigmoid loss function (MultiCategoryBlock
with BCEWithLogitsLossFlat) instead of softmax (CategoryBlock
with CrossEntropyLossFlat) as suggested by Jeremy in Lesson 9 (refered to as DOC in the paper).
I removed n classes (breeds) from the training data and then used them as the test data to see how many of them would be classified as one of the training classes. If softmax is used (without a threshold) all the test images will be incorrectly classified as one of the training classes. When a sigmoid loss function is used the results are good unless the test image (unseen class) looks (has similar features) to one of the training classes.
In this notebook I removed the beagle and pug classes from the training data. Then when I classified them using the trained model, many of the beagle images are “wrongly” classified as basset hounds. Which from a quick inspection
seems understandable, I struggle to tell the difference between the two classes.
The reason I am asking about the alternative methods is because intuitively I wouldn’t expect a CNN (or even an expert in pets) to be able to distinguish between an unseen class that so closely resembles a class that a model is trained to detect (or expert has observed through their lifetime), but I may be wrong?
Hi cudawarped hope all is well!
Nice work
I haven’t tried the methods you described yet as I am currently working on a big AI project. However your results are good.
I removed the beagle and pug classes from the training data. Then when I classified them using the trained model, many of the beagle images are “wrongly” classified as basset hounds. Which from a quick inspection seems understandable, I struggle to tell the difference between the two classes.
The reason I am asking about the alternative methods is because intuitively I wouldn’t expect a CNN (or even an expert in pets) to be able to distinguish between an unseen class that so closely resembles a class that a model is trained to detect (or expert has observed through their lifetime), but I may be wrong?
I agree with both your points above, in lesson 1 Jeremy talks about the difficulty of telling two types of cat apart. I don’t think its currently possible to tell two breeds or certain items apart if human experts can’t. I would think there must be some discernible differences to achieve classification. My own experience is the same as yours. This post, I did yesterday Lesson 1 Assignment - confused about results describes the difficulties, classes with similar images present.
I will try the techniques you mention when I next build a classifier.
Cheers mrfabulous1
Hi mrfabulous1,
The problem that I see, at least with dog breeds, is there is sufficient variation in the look of dogs of the same breed that some beagle’s look more like the “mean” basset hound than a beagle and conversely some basset hounds look more like the “mean” beagle than a basset hound, but I may be incorrect in this observation. In this case I can’t see that anything can be done.
In general case if the above isn’t true (i.e. beagle’s don’t look more like the “mean” basset hound, but are closer to the “mean” than some basset hounds), the probability from the Bayesian approach would tell me how far I am are away from this “mean” image. However I face exactly the same problem as I do with the output from the sigmoid. That is given the probability of a basset hound of 0.4 I still have to make the decision as to how to classify (unknown or basset hound). If I classify it as unknown (threshold 0.4) this will most likely reduce the accuracy of my classifier on known classes. This leads to the obvious question as to which is “better” the output from the multi-classification sigmoid or the probability from the Bayesian approach. I guess I will have to run some tests.
More generally I would like to find out which is the best solution to open-set recognition problem when applied to images.
I don’t think its currently possible to tell two breeds or certain items apart if human experts can’t.
In the beginning of lesson 2, Jeremy shows an example someone did of classifying ~110 cities or countries (can’t remember which) by satellite imagery. This is a situation where I can’t imagine a human doing as well as the classifier did (I think it got something like ~85% acc). So I would suggest it is possible even if unlikely.
Hi knesgood hope your having a fabulous day!
You’re right about that example, however even though those maps look similar to us there are 1000’s if not millions of discernible differences if you compare the images so this is a trivial for an AI, but difficult for a human.
If you take a real Rolex watch and a Fake one which fools the Rolex watch makers themselves and the experts alike, then it is unlikely that an image classifier can tell the difference either.
In the previous posts bassets hounds and beagles can look so similar that experts can’t tell them apart, because there’s not enough discernible differences in their features.
This is shown clearly when you look at confusion matrix’s.
Also in one of Jeremy’s videos he talks about the difficulty or classifying a bread of cat that experts debate about on the internet because they are so similar, one side says it this breed and another side says it is the other, the classifier he built was confusing the two breeds also.
Cheers mrfabulous1
You, sir, make very valid points. You’ve also given me my homework for the weekend - Folex detector
I agree with both of you but there is a difference between a classifier which sees all the classes in training and the open-set recognition problem.
From my understanding there will or at least should (my interpretation is that this is desirable and/or even the point of ml) be “unseen” features which a classifier will pick up on that a human may not and vice versa.
That said this may not always work, for example a resnet34 multi-class classifier trained on the pets data set confuses the unseen class of tench with Abyssinian cat. Looking at the two images below it is not entirely obvious why
however it becomes clearer from the below that the classifier is using texture and “sees” the tench’s scales as the pattern in the Abyssinian’s fur.
On the other hand when this does work, the “unseen” features should allow a classifier to distinguish between a beagle and a basset hound if both classes appear in the training data.
In the case of the beagle/basset hound conundrum, where the classifier doesn’t see any beagle’s in training, I assume classification accuracy can be helped if there is an “unseen” feature only present in basset hounds, but I guess generally this is going to depend on the feature and the weight the classifier places on it and not the function used in the last layer (softmax/sigmoid/openmax etc.) or probabilistic (Bayesian) approach.