I’m new to deep learning and even newer to fast.ai. I’m working on a project for my organization and would like to seek advice from people who would have done something similar.
Given an fashion image, I am expected to build a model that can extract multiple attributes out of it.
Is it clothes or shoes
Is it shirt, t-shirt, top, dress, gown, jeans, trousers, leggings, etc.
Is the color red, white, black, green, blue, yellow, orange, maroon, etc.
Is it striped, checks, solid, abstract, etc.
If it’s a shirt, is it a half-sleeves or full-sleeves, is it a casual shirt or formal shirt, is it a slim fit or a regular fit, etc.
Is it made of cotton or polyester or linen or rayon, etc.
To summarise, I have to extract global attributes (valid for all images) and local attributes (valid only if it’s a shirt - like half sleeves, full sleeves, etc.)
Some attributes can be considered as either global or local attributes (shirt - it can either be local to clothers -> menswear -> topwear, or it can also be programmed as global)
My question is, should I build a multi-stage classification model (where stage 1 model identifies if it’s an apparel or a footwear, stage 2 identifies mens apparel or womens apparel, stage 3 identifies topwear or bottomwear, stage 4 identifies shirt or t-shirt or pullover) or should I build it in a single stage only (directly classify mens shirt, mens trousers, womens dress, womens heeled shoes, etc,)
Any advise would be appreciated.