Hi all,
I built a gag classification app that tells if an image is a politician on stage or a comedian on stage. The purpose of it isn’t to actually classify politicians and comedians, but more to understand as an analogy to our brain what contextual clues the model uses to make that classification. In that sense, I’m more interested in misclassified images. Maybe those clues are the outfit, the stage set up, or the body language…
However, I’d like to train the model in such a way that it doesn’t memorize faces so that a picture of disheveled Romney in Comedy Cellar has a chance of being labeled comedian, instead of the model immediately recognizing Romney’s face from the training set and labeling him politician. It wouldn’t matter so much if I could control the predicted person to be outside the politicians and comedians used for the training set, but suppose I don’t.
It seems to me one (brute?) way of achieving this is by whiting out the faces in the training and validation sets using a face detection algorithm (or bruter, by doing it manually myself). This will make a portion of the image a white rectangle, but if the distribution of the locations and the sizes of these white rectangles is random, it shouldn’t matter for the prediction?
So my questions are:
- Does this approach make sense? Have you guys come across this topic of making CNN not detect the specifics of certain objects?
- If so, what face detection package do you recommend? A quick search brings me to Adam Geitgey’s Medium post, but it’s from 2016. Would also be nice to use fastai
Thank you for reading and offering advice!