Suppose i want to train a cat only classifier, that outputs a high probability when given an image of a cat and low or zero probability when given anything else. Now I can collect a few hundred cat images as positive class, but my negative class samples are practically infinite. How do i go about training such a classifier? please suggest which directions should i explore.
I have not personally worked through such a problem before, but here is a colab notebook created by Zach Mueller a couple of years ago (as part of his walk with fastai2 course that has a video on this topic as well) which walks through an example of training a model so that it returns an empty string when the image you are asking it to predict does not match any of the classes the model has been trained on.
Sounds similar in concept to the Not-Hotdog app.
One practical solution is to generate a square shape and randomly divide it into 1 or 4 squares and if one square random choose a colo(u)r for the one square. If four squares repeat the process for each of the squares until the square size is just detectable by the human eye. You have now a collections of things which could exist in the visual universe. You can use these as your not-cat control.
Here is one idea. Imagine that “cat” represents a point (or a small neighborhood) in the subspace of all pictures. Near there would lie pictures of tigers and other similar animals. Enough of them should be included as non-cats. In addition, add the randomized pictures described by Conwyn, pictures of several other four-legged animals, and pictures of randomly selected other objects (cars, scenarios, trees, etc.). In the end, the selection of non-cats is based on intuition that would allow the system to learn the boundary of the cat region in the subspace. The above suggestions are aimed at achieving that.
wow really creative. I’ll definitely try that. Could you share theoretical basis of this when you say “you have now a collections of things which could exist in the visual universe”
this makes sense, thanks a lot
thanks for the share, will definitely look into this.
I would not say theoretical but a pixel can be 256 * 256 * 256 possibilities and your image is a finite area and our eyes can only pixels so small. Therefore the 256 * 256 * 256 * picture-width (in pixel) * picture-height (in pixel) would define a finite sized set of possibility which could occur at a single time-frame depending on the responseness of your eyes. This thing you have would call a cat, member of cat family, and through the generality of species, genus, family, order, class,phylum,kingdom and domain. So theoretically the random picture could look like a cat 0.000000…001% of the time but generally would not look like a cat.
So your CNN kernel looks for squares of various sizes as you go through the layers. Therefore the random pictures should be a good example of not a cat. The thing you are trying to avoid is confusing a grey elephant with a long trunk and a grey cat with a grey tail.The random squares are different sizes so you could have a big square of grey which could be a cat but the adject squares prove it is not a cat.
I may have an English sense of humor but browsing through the ‘not_hotdog issue’ documentation with a smile when I scrolled to the “unfreeze freeze” bit I couldn’t stand it aLOL Seriously I agree on that point to force the net to recognize NOT similar to both classes patterns. Why you went into negatives? 0 as NOT isn’t applicable? use binarycrossentropy from logits=True as loss (NOT Angeles) function and sigmoid as a classificator
Stay cool Guys
I have made some good experience with using Focal Loss in such high class imbalance cases.
Also, heavy use of data augmentation and oversampling of the low-prevalent class.
I guess you don’t particularly care about “cats” and this is only an example here. Otherwise you could simply train your model on imagenet which has loads of cats and lots of non-cat images.
In general, when you have only very few training data, you can also use a pretrained model that has seen similar inputs during training (in the case of cat photos, that could be any standard classification model), then freeze the model and train a small MLP on top of the last or second-last latent embedding. This should give you a fairly robust classifier.
Hope that helps