Exploring my own style and likes with DL

michael-pru · September 9, 2022, 6:39am

I have a large collection of images and videos I love. Each has some tags denoting what specifically touched me (colors, specific pose, item of clothing, etc.), and a rating (0-5 stars).
My goal is to use that to train a model to understand my personal preferences and predict my ratings.

Do you know of anything similar or what I need to learn to achieve this? It’s somewhat similar to collaborative filtering, except without other users - each item can have its embedding. Also, I know it’s possible to learn the approximation of ‘style’ - like in the style transfer works.

I’ve been able to achieve 96% accuracy in predicting the most frequent tags - a simple multiclassification problem. But regression from tags to rating, or from raw pixels to rating doesn’t seem to learn anything, and I have better accuracy just predicting 2 stars to everything (baseline solution).

Any ideas/pointers would be appreciated.

joshiharshit5077 · September 9, 2022, 7:16am

Can you share the dataset link ?

michael-pru · September 9, 2022, 9:52am

It’s offline. But think of it as PASCAL_2007 with additional column ‘rating’.

zonkyo · September 9, 2022, 10:39am

As a start I would consider training two different models, one that suggest tags (your 96% accuracy network) and a second one that is either a tad simple (image in, rating out) or a tad more complicated, i.e. learning segmentations, piping these into a secondary model which learns (segmented image → rating).

But, as a starter, it sounds as if you would either put in the tags OR the image data, but not both; if that is so, a start would be putting both datasets at one to your network such that it learns the combination to get a better idea how you rate images.
As a starter, Keras: Multiple Inputs and Mixed Data - PyImageSearch - this is built with Keras but the approach should give you a hint what to do for the fastai/torch case.

Hope this helps!