Open-world cat detection (instead of cat vs. dog)

timanglade · November 1, 2016, 3:35am

What if instead of Cats vs. Dogs you were trying to take an arbitrary picture of the world (not just a picture of a Cat or a Dog, but a random picture of people, furniture, food, etc.), and figure out if there was a Cat in it? (Dropping the detection of Dogs completely…)

I worked on something like this after Lesson 1. On Jeremy’s recommendation I tweaked the Classifier to train two classes: instead of Cats & Dogs, I trained it on Cats & Unknown (the latter containing only Dogs at first, but it could function as a catch-all for other object types that are not Cats).

On my first attempt, I noticed the model I was training was over-eager to recognize cats. i.e. a picture of a dog would not be identified as containing a cat, but pictures of random objects (e.g. furniture) were routinely misidentified as having a cat in it.

So I started adding pictures of furniture to my Unknown class, and that seemed to make the false positive rate go down if looking at pictures of furniture. But false positives remained high on pictures of food, landscapes, etc.

So it looks like it would be very time-consuming to train the model on every other type of picture/object (portraits, foods, lanscapes, cartoons, etc.) just to make sure it’s not mis-tagging cats.

Is there an easier way to build an open-world classifier that is great at recognizing cats in random pictures (better than Vgg16), without having to build a general image classifier all over again? For example is there a way to enhance Vgg16’s accuracy in recognizing cats, without destroying all the other ImageNet classes it was capable of recognizing before?

timanglade · November 1, 2016, 4:04am

Looks like I asked this just a few minutes early I assume what we learned at the end of Lesson 2 (linear model on top of Vgg, and fine-tuning of that) will help here?

jeremy · November 1, 2016, 4:02pm

@timanglade yeah I think just a standard fine-tuning of a replacement last layer should be perfect, especially if the thing you are looking for is already an imagenet category. You could download the imagenet dataset and create a sample that contains all examples of your class of interest, plus a random subset of the other categories. I’d suggest over-sampling the categories that it tends to get wrong the most.

Could you provide more detail about your exact dataset, application and, and use-case? I have lots of thoughts about general approaches, but more detail would be helpful. Do you have a dataset you could share with the class, so we can help you?