Fashion images multi-level classification: Need advice

I’m new to deep learning and even newer to I’m working on a project for my organization and would like to seek advice from people who would have done something similar.

Given an fashion image, I am expected to build a model that can extract multiple attributes out of it.
Is it clothes or shoes
Is it shirt, t-shirt, top, dress, gown, jeans, trousers, leggings, etc.
Is the color red, white, black, green, blue, yellow, orange, maroon, etc.
Is it striped, checks, solid, abstract, etc.
If it’s a shirt, is it a half-sleeves or full-sleeves, is it a casual shirt or formal shirt, is it a slim fit or a regular fit, etc.
Is it made of cotton or polyester or linen or rayon, etc.

To summarise, I have to extract global attributes (valid for all images) and local attributes (valid only if it’s a shirt - like half sleeves, full sleeves, etc.)
Some attributes can be considered as either global or local attributes (shirt - it can either be local to clothers -> menswear -> topwear, or it can also be programmed as global)

My question is, should I build a multi-stage classification model (where stage 1 model identifies if it’s an apparel or a footwear, stage 2 identifies mens apparel or womens apparel, stage 3 identifies topwear or bottomwear, stage 4 identifies shirt or t-shirt or pullover) or should I build it in a single stage only (directly classify mens shirt, mens trousers, womens dress, womens heeled shoes, etc,)

Any advise would be appreciated.


You may want to look at the solutions for last years iMaterialist Fashion 2018 challenge on kaggle, won by our own @radek
This years iMaterialist Fashion 2019, launching shortly, is supercharged with segmentation and knowledge graphs! If you have the bandwidth to annotate, that’s an interesting approach.

Thanks a lot for the link.

Although the problem statement is not identical to what I’m trying to solve for, it does give me a lot of ideas about how to approach the problem.

Both single-stage and multi-stage pipelines have tradeoffs of their own; alot depends on dataset and definition of various classes.

But I would add that your proposed stage2 classifying mens/womens apparel is not straight forward if human models are not present in the image. Eg: trousers are difficult to distinguish.

@Rohitagarwal257, i’m interested on how you were able to solve this problem… i currently have a two stage classification problem where i need to do something similar…