Fashion images multi-level classification: Need advice

I’m new to deep learning and even newer to fast.ai. I’m working on a project for my organization and would like to seek advice from people who would have done something similar.

Given an fashion image, I am expected to build a model that can extract multiple attributes out of it.
Is it clothes or shoes
Is it shirt, t-shirt, top, dress, gown, jeans, trousers, leggings, etc.
Is the color red, white, black, green, blue, yellow, orange, maroon, etc.
Is it striped, checks, solid, abstract, etc.
If it’s a shirt, is it a half-sleeves or full-sleeves, is it a casual shirt or formal shirt, is it a slim fit or a regular fit, etc.
Is it made of cotton or polyester or linen or rayon, etc.

To summarise, I have to extract global attributes (valid for all images) and local attributes (valid only if it’s a shirt - like half sleeves, full sleeves, etc.)
Some attributes can be considered as either global or local attributes (shirt - it can either be local to clothers -> menswear -> topwear, or it can also be programmed as global)

My question is, should I build a multi-stage classification model (where stage 1 model identifies if it’s an apparel or a footwear, stage 2 identifies mens apparel or womens apparel, stage 3 identifies topwear or bottomwear, stage 4 identifies shirt or t-shirt or pullover) or should I build it in a single stage only (directly classify mens shirt, mens trousers, womens dress, womens heeled shoes, etc,)

Any advise would be appreciated.

2 Likes

You may want to look at the solutions for last years iMaterialist Fashion 2018 challenge on kaggle, won by our own @radek
This years iMaterialist Fashion 2019, launching shortly, is supercharged with segmentation and knowledge graphs! If you have the bandwidth to annotate, that’s an interesting approach. https://github.com/visipedia/imat_comp

Thanks a lot for the link.

Although the problem statement is not identical to what I’m trying to solve for, it does give me a lot of ideas about how to approach the problem.

Both single-stage and multi-stage pipelines have tradeoffs of their own; alot depends on dataset and definition of various classes.

But I would add that your proposed stage2 classifying mens/womens apparel is not straight forward if human models are not present in the image. Eg: trousers are difficult to distinguish.

@Rohitagarwal257, i’m interested on how you were able to solve this problem… i currently have a two stage classification problem where i need to do something similar…