First of all, happy new year and thanks for the great courses! I liked the top-down approach a lot and all the good practices integrated in the fast.ai library. It keeps things interesting and you get a really good intuition.
I am going to start a new project and I would like to ask for feedback and advice on my ideas to solve the problem. So what I want to do is build an application which takes an image of a piece of clothing (say a floral T-shirt) from the user and gives the user suggestions about whole outfits containing same or similar clothing.
I found the DeepFashion dataset which contains images and annotations for categories and attributes of clothes, their bounding boxes, landmarks and other information to match the clothes from stores to user-taken photos. That is the paper that introduces the dataset:
So the approach I am thinking of:
- When the user uploads a photo it is going to be fed into a clothes detection model which will allow me to crop it where the item is. - for that I am going to need a clothes detection model which I can achieve with the bounding boxes and category annotations of the dataset.
- I am going to have a database with pre-calculated gram matrices for all the items in the database. Whenever I need to a add new photos I am just going to add these gram matrices.
- I am going to use k-nearest neighbours to find items of the same category with similar gram matrices (and similar styles)
Things I am wondering about:
- What is a good way to retrieve things similar items from the database. What database is used for models in production or are all items in memory all the time?
- The dataset provides us with landmarks which are basically handles on the clothes which we could utilise to extract the style of clothes. What do you think of using these versus not using them but just the gram matrices of the whole images.
I am looking forward to hearing your thoughts, any feedback and comments will be useful!