Setting Layer Groups for a Nested Model

I’m creating a Siamese style network for comparing text using a pretrained model. The general format is as follows:

class SiameseNet(nn.Module):
    def __init__(self, encoder):
        super().__init__()
        self.encoder = encoder

   def forward(self, x1, x2):
        output1 = self.encoder(x1)
        output2 = self.encoder(x2)

        return SomeFunction(output1, output2)

I’d like to train this model using discriminative learning rates and gradual unfreezing on the encoder. The issue is the main model, SiameseNet, has a length of 1. This means the fastai library gives the model a single layer group, so discriminative learning rates and gradual unfreezing doesn’t work out the box.

To access the layers in the encoder, it has to be specifically indexed into by model.encoder, so the standard format of indexing into a model layers like model[:n] won’t work.

Does anyone know a way to set layer groups/discriminative learning rates with a model like this? Or if there is a way to make this model structure sequential?

2 Likes

Figured it out. You can write a function explicitly defining layer groups and pass it to the learner as the split_func parameter.

2 Likes

Hi Karl. Glad that you have figured it out.

May be a little late to the game. Have you seen Radek’s tweet and the whole thread?

Radek showed us how he implemented a Siamese style network using fastai 1.0 and PyTorch. I think one of his tweet address your question:

The line below gives me ability to freeze parts of the model easily and to train with differential learning rates.

learn.split([learn.model.cnn[:6], learn.model.cnn[6:], learn.model.head])

I think it’s good to share that here as it has some great tips that will probably be useful for future readers.

10 Likes

Just tagging this with siamese triplet and one shot because there’s going to be people searching for those terms :wink:

Following your work on this whole thing is a real joy @radek

Might as well explain a bit about the tags :slight_smile: I am not trying to be technically correct. These stuffs are super new to me.

Zero-shot, one-shot and few-shot learning are some approaches for “learning to learn” problem (meta-learning field). Siamese Networks is an approach for one-shot learning. It consists of twin networks with the same set of parameters (symmetric architecture). The basic idea of the model is to compute the output category of one given test example by computing a kind of ‘similarity’ with all the training examples.

Triplet loss is a type of metric distance learning?