How to find similar images based on final embedding layer?

First post on these forums. I hope this is the right place to ask this. if not please tell me where I should go :slight_smile:

I’m now three lessons in and I have the following question:

Suppose I’ve trained a classification model for an image data set.
And suppose I want to use this trained model to find the most similar images in the dataset to a given image.

My approach would be to get the value of the last embedding layer in the model for each of the images and to compare the (cosine?) similarity of these vectors.
I’ve heard from other people this is quite a common approach.

However, I’m curious to learn how I can actually apply this with fastai.
How can I get the feature values of each image in the last embedding layer of the model?

3 Likes

Welcome!

There is a bit about that in the second part of the course, at the end of lesson 11 (though it’s mixed with text). As for getting the result of a specific layer, I think we see it the first time in the lesson about style transfer (lesson 12). You need to use a forward hook on the specific layer you want the results from, for instance with a class like this:

class SaveFeatures():
    features=None
    def __init__(self, m): self.hook = m.register_forward_hook(self.hook_fn)
    def hook_fn(self, module, input, output): self.features = output
    def remove(self): self.hook.remove()

When you init this class with an object m (which should be the layer you target) it registers a function (hook_fn) that will be called when you go through the model. Here we save the output in self.features.

You should print your model (just type learn.model) and see the index of the layer you want, then you can create an instance of this class with sf = SaveFeatures(my_layer). Afterward, if you want the result with an input x, just go through the model with it, then you will find the results of your layer in sf.features.

12 Likes

Thank you so much!
Both for the reference to lesson 12 and the explanation :slight_smile:

2 Likes

Quick question, in pytorch, how do you access the layer? For vgg, would it be something like this:
Layer16 = vgg.features[16]

Where vgg = vgg16(pretrained=True)

In pytorch, you can find the layers that compose a model in model.children(). Since it’s a generator, you have to convert it into a list, so something like Layer = list(vgg.children)[idx] should work.

Vgg.children. I’ll try that. I have been doing vgg.features, vgg._modules.get(),

Thanks

@sgugger

Hi,

So what happens if we unfreeze the whole network? Are these features still the initial ones from when the model was set up? Apologies if I’ve misunderstood this - I’m trying to get my head around this.

Mark

Not sure I understand your question: the initial topic wasn’t talking about any kind of training, so there is no freezing/unfreezing and the weights of the model don’t change.

1 Like

Hi @sgugger,

Sorry if I wasn’t clear, it definitely stems from my own misunderstanding. I know it’s slightly off-topic from the thread but it’s practically the only place I can see SaveFeatures() discussed.

As an example take the unet from lesson 14 where we save the activations on the downward path to concatenate later on in the model. If we then use learn.unfreeze() (which is done in the lesson) are these activations updated as the encoder’s weights now change? I would expect/hope so but just wanted to check as it looks as though they are just created in the init().

Thanks,

Mark

P.S. Definitely off-topic but how do I format code nicely in a forum post like you did above?

Yup, if the weights change, the activations will be computed with the new weights. Note that the activations change at every iteration since the input of the model changed as well, which is the point of those hooks: to get the values of the intermediate activations in the model.

For your other question you want to type this in your reply
``` python
your code here
```
You’ll be able to see on the right window a preview to check everything is ok.

Brilliant, thank you.

I have a follow up if you don’t mind (though you’ve already been more than helpful). I wanted design a unet decoder that didn’t call x (the data input in forward()) on the way up but simply used the intermediate activations from the way down.

But for some reason I need to call x before I can access the saved features:

def forward(self, x):
    tmp = self.rn(x)  # not used in the decoder as output is too small spatially
    center = self.sfs[6].features

Do you happen to know why this is?

Mark

P.S. I should say I lost about 3 hours figuring this out last night. :persevere:

Ah yes, you’re missing the point of hooks. They will hook the values as the input passes though the model, but if you don’t pass anything, they don’t get anything to hook so they can’t have the values you want.
In short, save_features won’t contain the activations you want if x hasn’t gone through the models to produce those activations.

Thank you for this.

I understand what the hook is meant to do, but just don’t see how it does it. For instance, the sfs features are defined during the model’s init() call yet are actually updated in every forward pass. :thinking:

For what it’s worth I’d never heard of hooks until I saw this so it’s something I need to read more about.

Thanks for your help!

Very interesting, thanks! :slight_smile:

Allow me a question: should we disable the dropout by putting the model in evaluation mode?

For this task? Probably, yes.

1 Like

Also some fellas on pytorch forums recommend putting the inference code in a context where you instruct torch not to use the autograd, while some others don’t. What do you think? It should reduce inference time, particularly on the cpu.

Another question: where would you extract the features in order to do a bit of t-SNE? I picked the zeroth layer of the last block (the adaptive pooling), but it turned out to be a very bad-shaped tensor, so i picked the output of the lambda layer (since this question Lambda Layer - #2 by jeremy). It seems the lambda does some kind of flattening. If so, would it be the best point to collect the features? Thanks!

Last but not least, I would like to know your opinion about the method proposed by Jeremy: Record more in Recorder - #2 by jeremy. How does it compare with the class you wrote above (which works fine), if your are not interested in stats?

1 Like

Here is my notebook to do it. I wrote Image similarity on Caltech 101 dataset.! Here is the output(the top left image is what is used to find other 5 similar images)-


Another example -

Here is the link to notebook on my Github. Will write a blog about this later this month.

4 Likes

Here is the link to the blog.

1 Like

I tried this approach to recommend similar images in DeepFashion dataset, the results were better than I expected.
Here is a link to my blogpost explaining the process - https://towardsdatascience.com/similar-images-recommendations-using-fastai-and-annoy-16d6ceb3b809

Here are a few examples,


Hey, I’ve found something courius in your work. How did you handle batches while saving features?
When you are using feature_dict = dict(zip(img_path,sf.features)), you are only using part of your embedding features, because sf contains batches of your vectors.