Best approach for a "it is or it isn't" classification

Hey, I was wondering whats the best way to prepare a training set for a classifier that recognizes only one class and by that I mean it decides whether it sees it on the picture or not. Is it still a binary classification? What images should a training set for the “everything_else” class contain? For example, lets consider a simple task of recognizing specific person face. Besides a set of concrete face photos I need some images for the second class. My first thought was to show classifier some random unrelated images and label them as 'something_else" but then it might trigger whenever it sees a human face. Would adding some random faces to training set be sufficient? … At that moment of writing this post I realized that I can check it by myself. I have already had 50 photos of my friend face (ye I know weird), and for “something_else” class I download 100 random faces plus 300 random images, and by “random images” I mean images that show up when you type “image” in google graphics xD. Because of low number of actual faces I decided to enlarge validation part to 40% of whole set. The resulting convnet had 98% accuracy, however only 16 out of 20 my friends faces were recognized correctly. If I enlarge training set the accuracy would go higher for sure. However maybe there is a better approach for this kind of problem?

2 Likes

It sounds like what you’re describing might be related to face verification - this Medium post has a description of Andrew Ng’s lesson from his Coursera course.

I think you could get by with a face verification system if you have a two step process. First, face detection algorithms (does this photo contain a face or not?) are pretty good, so you could have that as a first pass. Then if yes, you can proceed with face verification.

Are you doing any preprocessing on the face images? If not, that could impact your performance, because you’d likely want to do things like crop and rotate the face.

1 Like

Thank you for response, the idea of separating this process into those two tasks is actually pretty clever. I ll for sure read the article you mentioned and maybe check out the Andrew course. Im doing his Machine learning course right now ;]

1 Like