How to deal with autoencoder (or other) encodings for similarity inference and clustering

I trained an autoencoder that accepts an image and produces an encoding from the encoder. The encoding is 64x64x64 so flattening it into a row vector of size 262144.

I have 5000 images, and given a test image( which goes through the encoder) , I need to find n similar encodings and corresponding images, and maybe later cluster the whole dataset.

one GitHub repo owner seems to have done it by concatenating the image encoding into a 5000,262144 matrix, and then running a knn on it with the test encoding.

I can’t do this on Colab cuz the instance crashes when RAM fills up.

Tried to use a regular for loop to convert each picture and save it as a .npy, but that was extremely painful since Colab just freezes when the loop is more than 500 big.

Even if I do have the .npy files, not sure how to classify, or cluster since I can’t put the whole thing on the RAM.

This probably won’t help with the issue you’re facing, but the encoding size you mention seems huge compared to what I’ve seen. I usually use vectors of size 512 - 2048 as an encoding.

I definetly agree. Let me try and reduce the size and see where it goes

The easiest way is to use nmslib. (Just Google for examples).

You Can also use the tensorboard projector callback to explore your Encodings visually.

And i agree with darek … that are hughe vectors.

Ok I reduced the size and it worked like a breeze. The encoder seems to be doing a bad job though. Seems to suck at inference and even recognizes a different species as “similar”. Not sure why this happened. As the plot shows, loss has consistently decreased (ignoring the last bump, I used the model at the lowest loss)

Am I messing up in inference? Is the model too weak? I doubt it cuz googling “autoencoder similarity pytorch” brings a github repo up, and his inferences seem better than mine.

Some help here would be great!

imho auto encoders just don’t work well for similarity. I’d suggest manually label some data, train a classifier (fine tune a pretrained model) on that data and then du the feature extraction on the whole dataset.

1 Like

But encoders are made just for this purpose right? To reduce the image to an encoding, on which work like similarity can be done?

it’s said that auto encoders are made for that … but in my experience the don’t find similarities in a human sense. I never was satisfied with the results. So I completely switched to the classifier -> feature extraction approach.


Ok so I went back to this today, and completely forgot to mention. My data isn’t labelled! I just have 4800 pictures from 0.jpg to 4799.jpg. So I can’t switch to a classifier.