How to train a binary classifier with infinite examples of the negative class

Suppose i want to train a cat only classifier, that outputs a high probability when given an image of a cat and low or zero probability when given anything else. Now I can collect a few hundred cat images as positive class, but my negative class samples are practically infinite. How do i go about training such a classifier? please suggest which directions should i explore.

I have not personally worked through such a problem before, but here is a colab notebook created by Zach Mueller a couple of years ago (as part of his walk with fastai2 course that has a video on this topic as well) which walks through an example of training a model so that it returns an empty string when the image you are asking it to predict does not match any of the classes the model has been trained on.

1 Like

Sounds similar in concept to the Not-Hotdog app.

2 Likes

Hi
One practical solution is to generate a square shape and randomly divide it into 1 or 4 squares and if one square random choose a colo(u)r for the one square. If four squares repeat the process for each of the squares until the square size is just detectable by the human eye. You have now a collections of things which could exist in the visual universe. You can use these as your not-cat control.
Regards Conwyn
image

2 Likes

Here is one idea. Imagine that ā€œcatā€ represents a point (or a small neighborhood) in the subspace of all pictures. Near there would lie pictures of tigers and other similar animals. Enough of them should be included as non-cats. In addition, add the randomized pictures described by Conwyn, pictures of several other four-legged animals, and pictures of randomly selected other objects (cars, scenarios, trees, etc.). In the end, the selection of non-cats is based on intuition that would allow the system to learn the boundary of the cat region in the subspace. The above suggestions are aimed at achieving that.

2 Likes

wow really creative. Iā€™ll definitely try that. Could you share theoretical basis of this when you say ā€œyou have now a collections of things which could exist in the visual universeā€

Thanks.

this makes sense, thanks a lot :slight_smile:

thanks for the share, will definitely look into this.

1 Like

I would not say theoretical but a pixel can be 256 * 256 * 256 possibilities and your image is a finite area and our eyes can only pixels so small. Therefore the 256 * 256 * 256 * picture-width (in pixel) * picture-height (in pixel) would define a finite sized set of possibility which could occur at a single time-frame depending on the responseness of your eyes. This thing you have would call a cat, member of cat family, and through the generality of species, genus, family, order, class,phylum,kingdom and domain. So theoretically the random picture could look like a cat 0.000000ā€¦001% of the time but generally would not look like a cat.
So your CNN kernel looks for squares of various sizes as you go through the layers. Therefore the random pictures should be a good example of not a cat. The thing you are trying to avoid is confusing a grey elephant with a long trunk and a grey cat with a grey tail.The random squares are different sizes so you could have a big square of grey which could be a cat but the adject squares prove it is not a cat.

Regards Conwyn

1 Like

I may have an English sense of humor but browsing through the ā€˜not_hotdog issueā€™ documentation with a smile when I scrolled to the ā€œunfreeze freezeā€ bit I couldnā€™t stand it aLOL :rofl: Seriously I agree on that point to force the net to recognize NOT similar to both classes patterns. Why you went into negatives? 0 as NOT isnā€™t applicable? use binarycrossentropy from logits=True as loss (NOT Angeles) function and sigmoid as a classificator :slight_smile: :slight_smile:

Stay cool Guys

I have made some good experience with using Focal Loss in such high class imbalance cases.
Also, heavy use of data augmentation and oversampling of the low-prevalent class.

I guess you donā€™t particularly care about ā€œcatsā€ and this is only an example here. Otherwise you could simply train your model on imagenet which has loads of cats and lots of non-cat images.

In general, when you have only very few training data, you can also use a pretrained model that has seen similar inputs during training (in the case of cat photos, that could be any standard classification model), then freeze the model and train a small MLP on top of the last or second-last latent embedding. This should give you a fairly robust classifier.

Hope that helps :slight_smile:

1 Like

@abrandl Welcome to the community, I totally agree, higher gamma in this function will push the model into ā€˜focusingā€™ on higher state patterns.
@gjraza here you are docs:

https://docs.fast.ai/losses.html#bcewithlogitslossflat

Regards :slight_smile:

1 Like