(I can’t imagine this is a new question but my searches haven’t turned up what I’m looking for.)

I love how the Siamese network example shows how to extend existing DataLoader infrastructure, and then scores such high accuracy so quickly using “merely” the CrossEntropy loss, without any discussion of contrastive/hinge/triplet losses, or mining for “hard examples”.

…I still would like to try some metric learning with fastai (so I can visualize the embeddings), and so I’ve been looking at resources such as the humpback whale Kaggle competition and PyTorch Metric Learning, but not seeing any posts of people sharing working code where they did this with fastaiv2.

So below is my “minimal” work-in-progress code so far, and sadly it doesn’t work (yet), which is why I’m posting – looking for help!

In my Colab notebook, it’s exactly the original Siamese network tutorial except that I’ve modified the model a bit: The original head Flattened and called BatchNorm across both samples at once, and the subsequent Linear layer mixes features from both images before any individual embedding vectors are produced. So…that doesn’t seem to be what you’d want for metric learning. So in my model, instead of attaching the “head” to merge the two encoder outputs directly (as in the original tutorial), I name the (ordinary, non-merging) head “mid” and attach it onto the end of each encoder, then I use a final “head” to merge the vectors produced by the two “mid” sections. Here are the key parts:

```
class SiameseModel(Module):
def __init__(self, encoder, mid, head):
self.encoder, self.mid, self.head = encoder,mid,head
def forward(self, x1, x2):
ftrs = torch.cat([self.mid(self.encoder(x1)), self.mid(self.encoder(x2))], dim=1)
return self.head(ftrs)
encoder = create_body(resnet34, cut=-2)
embed_dim = 512 # i'd like to use only 3 dims, but keeping it large for now
mid = create_head(512, embed_dim, ps=0.5) # not the true head
#head = nn.Seqential( nn.Linear(embed_dim*2, 2))) # that doesn't work well
head = nn.Sequential( # Ok, try giving it a bit more nonlinearity on the final end:
nn.ReLU(inplace=True),
nn.BatchNorm1d(embed_dim*2),
nn.Dropout(),
nn.Linear(embed_dim*2, 2))
model = SiameseModel(encoder, mid, head)
def siamese_splitter(model):
return [params(model.encoder), params(model.mid), params(model.head)]
learn.freeze_to(-2) # freeze just encoder, but train mid and head
```

…that’s it. I haven’t even introduced the ContrastiveLoss, metric learning, etc. The code so far *should* be very similar to the original tutorial…except that mine doesn’t work! What I find is that the loss never goes below 0.7, and the accuracy never improves above 50% – i.e., random guessing.

It didn’t help when I tried various sizes of embed_dim (from 512 down to 3) and/or varied the makeup of the final `head`

layer (as you see from the comments above). And unfreezing doesn’t help.

Can anyone offer ideas on why this doesn’t work? (And maybe even how to fix it!)