Image clusterization by similarity of texture or style


First of all, many thanks, Jeremy and Rachel, for this great course!

About the problem we’re trying to solve. We have a big library of furniture object images (high quality renders, perspective view) and the goal is to find similar items for given piece of furniture. But the definition of ‘similarity’ in my case is a bit different - we’d rather have similarity by style of texture or colour and not that much by shape. For instance, finding an armchair with fabric texture similar to some given L-shaped sofa.

So far we managed to get nice clusters by shape. For instance, in a set of bar stools (570 pieces), we get them grouped nicely by shape. The approach we used is applying PCA, then t-SNE on the output of the last conv. layer of VGG16. But if another type of furniture (such as tables) is added there, I get separate clusters for tables and for chairs.

I was also thinking about using embeddings, but then the ‘vocabulary’ will be fixed to my furniture set. And I’d rather have something I can run on an image (say, a photo of a sofa) and get similarly textured images.

What approach would you use for such a task?

PS. Btw, this question is also a part of my assignment for Lesson 3 :slight_smile:

Hi @valentin.perret

Have you tried to apply your cluster approach on the first layers of VGG16?

I was considering doing so, but rejected this approach when I realized there will be hundreds of thousands parameters to apply PCA on. For instance, 112x112x64 (800K) or 56x56x128 (400K) is much bigger than 7x7x512 (25K) we get in the end. Although, in combination with some aggressive max pooling it might work.

But still, even on first conv layers, like we saw in Lessons 0 and 3, there are both features we would like to group on (gradients, regular patterns) and features we’d rather skip (edges).

I wonder if there is any trick in the way training data is set up for such a task, to make the net prefer one kind of features more.