I would like to fine-tune a vision model on my image dataset und would then afterwards like to extract feature vectors from the network after the flatten layer respectively (for plotting them in a t-sne for example).

Is there something like the feature extraction from pytorch for Fastai too? Or can one maybe use this function from the timm library when using timm models in fastai? (timm has the same or a similiar feature extraction function like pytorch).

Using hooks is quite more effort for all images I guess, so I would probably prefer another option.

I tried using the fastai “cut_model” function but it (at least with default settings) does not cut the model after the flatten layer (I end up with resnet18 for example with 512 7x7 feature maps), so is there maybe another option to extract activations from the network later (after the flatten layer)?


Short answer:
If it’s a resnet18 with vision_learner, then you can cut like this.

new_head = cut_model(learner.model[-1], 2)
learner.model[-1] = new_head

If data goes over the network now, the feature vectors will come out.
Like this:

x, y = dls.one_batch()
x.shape  # torch.Size([64, 3, 224, 224])
feature_vectors = learner.model(x)
feature_vectors.shape  # torch.Size([64, 1024])

Longer answer:
So you have something similar like this (it’s just an example based on your text):

learner = vision_learner(dls, resnet18, metrics=accuracy)

It doesn’t matter if it’s a resnet18 or a fancier timm model, at the end of the day these are just pytorch models, so usually they have a bunch of nn.Sequential layers and even more layers sandwiched between the sequential ones.

A common convention is that people call the beginning of the network “body” and the end “head”.
The outermost sequential layer contains “body”, which is a sequential layer with an index of 0, and “head”, which is also a sequential layer with an index of 1.

You can check what is the exact structure of your model in a notebook cell:
This will print a very long list of layers, so I just show the relevant part for this resnet18:

  (0): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (5): Sequential(
    (6): Sequential(
    (7): Sequential(
  (1): Sequential(
    (0): AdaptiveConcatPool2d(
      (ap): AdaptiveAvgPool2d(output_size=1)
      (mp): AdaptiveMaxPool2d(output_size=1)
    (1): fastai.layers.Flatten(full=False)
    (2): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Dropout(p=0.25, inplace=False)
    (4): Linear(in_features=1024, out_features=512, bias=False)
    (5): ReLU(inplace=True)
    (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): Dropout(p=0.5, inplace=False)
    (8): Linear(in_features=512, out_features=37, bias=False)

So your problem was with cut_model(learner.model, -1) you lost the whole “head” (it’s a complicated head :)), but you want to lose the last 7 inner layers in the “head” and keep the first 2 inner layers also in the “head”.

That’s why we use cut_model(learner.model[-1], 2), so we gave the “head” as model for the cut_model method (it doesn’t care about submodels and models, any nn layer is a model in itself) and we told it to keep the 1st two layers in this model - which are the AdaptiveConcatPool2d and the fastai.layers.Flatten in the “head” :wink:


thank you!! :slight_smile:

that helped a lot for understanding.