Is having background a known requirement for person recognition

Trying to build a person recognizer. Basically I went to facebook and pulled out 4 of my friends pics - around 20 each and 20 of mine.

I thought that as I am just trying to do this with a small amount of data which I am breaking in my train/valid/test(17/2/1) I should remove the background and other people’s faces from the pics. I spent time to manually crop out other people and kept as little amount of background as possible. Basically trying to get a headshot.

My accuracy never went above 60 to 65% for my test data. I shuffled the data every time I decided to try out a new batch size, data augmentation so doesn’t seem to me that I would be overfitting. I was overfitting but I thought that was due to due to lack of data. So I decided that I will try a PersonA/PersonB recognizer and that should work out as rather than trying to build a 5 person recognizer I am just building a 2 person recognizer. That should reduce my data requirement. Did not work out.

Then I thought let’s try cats and dogs maybe I am doing something wrong. Same code got me 100% accuracy on cats and dogs with the same amount of images split in same ratio. I changed the test set to check whether that was a fluke. Was reproducible.

  • I thought what was difference between cats/dogs and personA/personB recognizer. Species would cause them to look differently. But same thing happens with people. Different clothes make them look differently. No?
  • Then I thought maybe the dimensions? Turned out that wasn’t the case either. Random dimensions in both cases.
  • I thought I never cropped cats/dogs images and their images have other things in them. Took my accuracy to 75 to 80. Between any 2 people as long as I used the original image. I thought maybe that is due to having different backgrounds, skin color but nothing like that stood out.

So is background a known good thing for person recognition? I added all of my thoughts here because maybe I am just going in the wrong direction with this

Which model are you using? Maybe the model was trained to differentiate cats and dogs but wasn’t trained to specifically learn about the differences between 2 persons. In facenet (a model for face recognition) they find that using a triplet loss helps reach good accuracy. Maybe you should go in that direction. But first you will need a lot more data.

I was using ImageNet that is used in initial lessons only