Hello, I am attempting to create an image classification model that can detect when a picture does not fall into any of the categories. It is somewhat similar to the question in Cats or dogs or neither?. The solution proposed is to just train it with pictures of dogs, cats, and non-dog/cats. However, it seems to me that there is a potential problem with this approach. Taking the dogs vs cat vs neither problem:
Assume that the neither category consists mostly of elephants and pigs. The model would simply associate the neither category to an animal that is an elephant-pig hybrid. Since all of the categories consists of 4 legged animals, if I then passed a picture of a fish or tree to this trained model, it does not seem that the model will be able to accurately classify this new image as neither. Is my understanding here correct?
It seems like the above solution would also hinder the training process. For example in training, if a picture of a sheep was encountered, it would be trying to fit the sheep as a cat/dog/elephant-pig rather than cat/dog/neither.
If we set the model to classify any picture with a confidence below a certain threshold as neither, wouldnāt it be easily confused by pictures of wolves or tigers? These animals seem to share more similarity to cats/dogs rather than the āneitherā category and would lead to a high confidence of it being classified as a dog or cat rather than neither.
Are such concerns valid? Is there a better solution to this problem?
In Yolo algorithm if not an object detected , it will be considered a background. So if not a cat or a dog is a background. However Yolo has a object bounding box, if something doesnāt have a bounding box its an background already.
Now my question. how could you tell the data that your image is not a dog nor a cat without using an extra parameter ?
I have one ideia, you could create an extra parameter telling if can be seen a nose or a tail on the image. Learning to identify that tail or nose would suffice to consider as Dog/Cat or Nothing.
If you think about it. If the cat is facing you can see the nose, if backwards the tail can be seen, if beside it may be seen one or both (tail and/or nose) so I believe that would suffice to predict tails and noses and differentiate from the Animal or background.
Actually thatās a good idea! If I understand correctly you are talking about using some form of transfer learning? So we would first train a model on dogs/cats to let it recognize the features of these animals. Then, freeze these layers and add on a few more layers at the end and train it on a dog/cats/nothing dataset?
About the yolo algorithm and bounding boxes, that looks quite interesting but unfamiliar. Looks like I have some reading to do. Thanks!
Using images in the ānegative setā that would be similar to images of the āpositive setā (in your example, cat\dog is the āpositive setā. The post recommends using something similar to cat\dog as the negative, for example, tigers)
Using images that contain a single object in the negative set.
The ideal solution for the problem, I think is
Which requires only a positive set. However, Iām not sure how to implement it using deep learning, and fast.ai especially.
If you make any progress, Iād love to hear that.