Classify Pairs of Images

I have been going through the forums for days and can’t seem to find a straight-forward way to classify a pair of images instead of a single image. I can find examples related to tabular, NLP, images with more than 3 channels, etc. etc. I’m assuming all these examples are (necessarily) more complicated than what I am trying to achieve… I was thinking I could combine the output layers of two CNNs into a single output, but not sure what would be the best way to achieve that. Also, there is the issue of the data source that would feed in pairs of images into this model. Does anyone know of a simple example that might demonstrate this?


Can you describe what you’re trying to do in a bit more detail?


You may want Siamese Networks? There’s a few examples on here with them

I’m trying to take a pair of images and classify that pair as either Normal or Abnormal.

I assume you mean to modify the code to turn it into a classification problem. Do you know of a specific example using a recent version of fast AI that could be adapted in some way? I have seen a few implementations as well, but they were written using a really old version of fast AI.

What you are describing is a Siamese model. Where it either is or is not a match. There’s a few examples of you look hard enough, and if you feel like moving to fastai2 there’s an example there for a Siamese in the newer datablock api

I’m totally cool with moving to the latest version of Fastai. Let me check it out. Thanks!

There are some implementations of siamese network in fastai v1. You can take a look at here: , and here:


I can’t seem to find a Siamese example in the latest version of Fastai. Do you know which notebook it is in?


I understand siamese network is for comparing pairs of images, i.e. for similarity. I think what @rbunn80130 is describing is different (as is my use case). I have a pair of images captured with different modality (e.g. CT & MRI scan) and I want to classify it they contain a tumour or not. Can siamese be adapted for this case? Can I simply stitch the two images into one image and then train as usual for single modality case?

To get a starting point benchmark you could train 2 independent models and essentially just have them vote (if one or both says yes, then consider it a yes). If the images have the same number of channels and are a similar dimension then stitching them together, one next to the other, would also be a pretty simple way to get started. You could do the stitching ahead of time in a pre-processing script so you don’t have to deal with a more complex dataloader.

Those methods are not likely to give you the best performance, but they should be pretty easy to implement and give you a good benchmark to compare a more sophisticated model against. It’s also possible that the results of one of these two methods gives you a good enough result for what you’re trying to do.

A siamese network might work as well if the number of channels and image sizes are the same. I suspect this would give a better accuracy than the simpler models I proposed earlier. Siamese networks do use the same weights for both inputs and I suspect each of your modalities does look quite a bit different so it might be worth modifying the siamese network so each input gets its own model/weights and then combine them at the end before the final prediction like the siamese network does. Sort of a hybrid Siamese model.

1 Like

Thanks for your advice @matdmiller

1 Like