Siamese Network Architecture using library

The Kaggle’s Quora Pairs competition’s objective is to figure out if 2 questions have the same meaning. This should help users find similar questions and reduce duplicate content on Quora.

One solution could to be create a language model using the dataset. And then forming a Siamese Network (reference to Siamese twins, image below from Medium article) that takes in 2 questions and compares the output activations using cross entropy (or Manhattan distance).

How can this Siamese network architecture be implemented using the library.
I have trained the language model on the Quora dataset.
Would I need to implement this architecture in PyTorch, or can I use some modules to create this architecture.


@parth_hehe I’m looking into doing something similar. What did you end up doing?

This is as far as i got to a fully working model. I can’t seem to get the same accuracy as stated in the blog post. I must have missed something out in my implementation. Feel free to run it and see if it works for you dataset.

So far im getting 80~% accuracy.

class SiameseSentence(nn.Module):
    def __init__(self, ntoken, emb_sz, n_hid, n_layers, pad_token, bidir=False, dropouth=0.3, wdrop=0.5):
        self.ndir = 2 if bidir else 1 = 0
        self.encoder = nn.Embedding(ntoken, emb_sz, padding_idx=pad_token)
        self.rnns = nn.LSTM(emb_sz, n_hid, n_layers)
      , self.initrange)
        self.emb_sz,self.n_hid,self.n_layers,self.dropouth = emb_sz,n_hid,n_layers,dropouth

    def forward(self, inputs):
        sl, _, bs = inputs.size()

        emb_0 = self.encoder(inputs[:,0,:])
        emb_1 = self.encoder(inputs[:,1,:])
        outputs0, hiddens0 = self.rnns(emb_0)
        outputs1, hiddens1 = self.rnns(emb_1)
        distance = self.distance(outputs0[-1], outputs1[-1])
        return distance
    def distance(self, x1, x2):
        return torch.exp(-torch.norm((x1 - x2), 1, 1))

I got siamese-mobilenet with fastai version here However, I couldn’t get contrastive loss to work for some reason. I instead used a simple cross entropy loss on the final distance. The experiments are run on cifar10

1 Like

It’s pretty close! One thing they do in the blog post is to freeze the weights on the embeddings - have you done that? Are you using the same optim and hyperparams? Have you checked whether your weight initialization is the same?


@jeremy In my case, I didn’t use any pretrained embeddings. After training around 10 epochs while decreasing the learning rates, I got around 82.8% which is more or less the same results. Thanks! Really enjoying pytorch so far.


which one is better, extending Learner and create siamese learner or use the example that given by @javiersuweijie?