Segmentation before image classification?

Hi everyone !

I am currently working on a project around sneaker / footwear classification.
I have optained quite good results when only learning and classifying based on official product images so I decided to tune it a notch.

My aim is to be able to classify random pictures of people wearing sneakers, such as this one for instance : Link to Instagram or this one. The latter being probably way more complicated.

I have done some scrapping (have around 40k roughly properly classified images, scattered on around 430 models) and trained using resnet50 and only could get circa .25 error rate. So I have started looking on techniques on how to improve my dataset.

From reading some papers, I’ve come to the impression that segmenting and cropping the image around the sneaker would be a great way to do this. It would also allow me to do more data augmentation I believe. Do you believe this will help me ?

And, if so, how would you advise me to do the segmentation + crop part ?

I have thought of training a classifier sneaker-or-not and then randomly cropping images and using its prediction to know if the image is cropped enough / properly. But this feels quite as a dirty way to do it to me.

Thanks for reading and helping, I can’t wait to share with you the eventual results!

1 Like

How about using an existing pose estimator to get information on where the shoes are and crop using that information?