Help needed in DataBunch construction

Sayak · May 14, 2019, 5:29pm

Hello all. I am currently attempting the Recognizing Faces in the Wild challenge on Kaggle. The task is to determine if two people are blood-related based solely on images of their faces.

The labels look like:\

F0002/MID1 and F0002/MID3 mean that In Family F0002, MID1 is related to MID3 (the first row in the above figure).

More information about the dataset:

the training set is divided in Families ( F0123 ), then individuals ( MIDx ). Images in the same MIDx folder belong to the same person. Images in the same F0123 folder belong to the same family.

The training images are contained in a folder which contains images from 470 families and its structure looks like (a small snap):

Now if you zoom into a folder of a particular family, you get:

Given all this data, you are to build a system that would take image-pairs as given in the test set and will predict if they are blood-related or not, The images from the test set look like:

The prediction file contains image-pairs like:

And accordingly, we have to predict if they are related.

On this problem statement, I am struggling to understand how do I make the data bunch to feed that to a model. Any clue and pointers would really be helpful.

piby4 · July 3, 2019, 11:57am

Hi Sayak,
I had looked into the problem. Not sure if this is a traditional straight forward classification problem .

This is a find-the-distance-between-face-embeddings problem.

Easier way to do would be

Find the face-embeddings of the pairs of faces ( use libraries like dlib ) for this.
Each embedding gives a 128 element vector.
Find the cosine distance / L1 distance between the two faces.
Fix a threhold for distance.
If the cosine distance is less than the threshold, the two images are blood related.

This would give you a decent result in the leaderboard. ( But this is not the only way )

Refrences :
https://github.com/ageitgey/face_recognition ( library in python built on top of dlib )
http://dlib.net/ ( dlib package - amazing C++ library with python bindings )

Regards

Sayak · July 4, 2019, 6:50am

Thank you very much for your suggestion. I had constructed the dataset initially (basically an Image -> Image mapping) from the .csv file. Here’s the Kaggle Kernel of that: https://www.kaggle.com/spsayakpaul/data-exploration-recognising-faces-in-the-wild. After that, I could not figure out how should I proceed. You will see a Unet in the kernel though, but it was kind of a no-brainer.

ste · July 4, 2019, 5:05pm

You can use a Siamese-Network approach enabling the network to recognize if two pictures are blood-related instead of the same person as usual.

Take a look at: Siamese Networks