Using fastai on a custom new task(tutorial, Siamese network) - modify for inference with precalculated features


I’ve read and implemented this great tutorial (, 24_tutorial.siamese.ipynb) and ended up with a network with astonishing performance.

Now I’d like to modify/use it in a way so I can precalculate the features for images and store them in a DB so that during inference, I only have to pass one image through the model. I am aware that with the current architecture, as presented in the notebook, I will still have to pass both images through the head of the model. But I thought I could try and avoid having to calculate the body part each and every time.

Being new to fastai I tried the following which, alas, didn’t work as I had hoped… :wink:

# prepare images using the same transforms that we pass to the dataloaders (as after_item)
i1 = files[0]
im1_final = Resize(224)(im1)
im1_final = ToTensor()(im1_final)
im1_final = IntToFloatTensor()(im1_final)

i2 = files[2]
im2_final = Resize(224)(im2)
im2_final = ToTensor()(im2_final)
im2_final = IntToFloatTensor()(im2_final)

# do what the forward method does
# my goal is to store the result of learn.model.encoder in a DB and just retrieve it when doing inference
# reason: I will usually want to compare one image with all that are in the DB to recognize objects I’ve seen before

# this should come from the DB for one image
enc_body1 = learn.model.encoder(im1_final.cuda().unsqueeze(0))
enc_body2 = learn.model.encoder(im2_final.cuda().unsqueeze(0))

# this will always have to be done
ftrs =[enc_body1, enc_body2], dim=1)
ftrs_final = learn.model.head(ftrs)


Unfortunately, this doesn’t yield the same result as learn.predict(). Could someone please point me in the right direction? What would be the(or one) right way to achieve such a “shortened” inference.

Thanks in advance & best regards,

The third item returned by learn.predict() is the probabilities. Your learn.model.head(ftrs) is returning the final activations output, not yet with softmax taken. If you do F.softmax(ftrs_final, dim=1), I think you will get the same probabilities as learn.predict(). Give it a try, and let us know if it works : )


1 Like

Hi Yijin,

thanks for your reply! =)

I tried as you suggested and it got me much further: the results now add up to 1 and are pretty close to those of predict(). E.g.:
predict: (tensor(0), tensor(0), tensor([0.9978, 0.0022]))
manual: tensor([[0.9585, 0.0415]], device=‘cuda:0’, grad_fn=)

Latter one is still slightly different every time I run it though(while predict is stable), a few examples:
tensor([[0.8557, 0.1443]], device=‘cuda:0’, grad_fn=<SoftmaxBackward>)
tensor([[0.8861, 0.1139]], device=‘cuda:0’, grad_fn=<SoftmaxBackward>)

So I guess I must still be doing something a little different…perhaps I’m missing some transformation that happens automatically…?

I’m using the code from the tutorial:
splits = RandomSplitter()(files)
tfm = SiameseTransform(files, splits, lbl2files=lbl2files, labels=labels)
tls = TfmdLists(files, tfm, splits=splits)
dls = tls.dataloaders(after_item=[Resize(224), ToTensor], after_batch=[IntToFloatTensor, Normalize.from_stats(*imagenet_stats)])

What’s included here that I’m not doing in my code is the normalization by imagenet stats, as I wouldn’t know how to apply them to my “manually loaded” images.

Here’s my current code again, as modified by your suggestion:
im1_final = Resize(224)(im1)
im1_final = ToTensor()(im1_final)
im1_final = im1_final.cuda()
im1_final_t = IntToFloatTensor()(im1_final)

im2_final = Resize(224)(im2)
im2_final = ToTensor()(im2_final)
im2_final = im2_final.cuda()
im2_final_t = IntToFloatTensor()(im2_final)

def calc_features(i1, i2):
enc_body1 = learn.model.encoder(i1.unsqueeze(0))
enc_body2 = learn.model.encoder(i2.unsqueeze(0))
ftrs =[enc_body1, enc_body2], dim=1)
ftrs_final = learn.model.head(ftrs)
ftrs_final = F.softmax(ftrs_final, dim=1)

calc_features(im1_final_t, im2_final_t)

BR, Chris

I might be wrong but did you normalise?

1 Like

Thanks for your reply!

I think you’re right, but I actually haven’t found out how to apply the Normalization correctly without a DataLoader and I’d like to avoid using one for such a “simple” task.

Anyway, I tried using one to apply normalization:

edit: forgot to add the batch transform definition
imagenet_stat_tf = Normalize.from_stats(*imagenet_stats)
batch_tfms = [IntToFloatTensor, imagenet_stat_tf]

test_files = [im1, im2]
test_ds = PathDataset(test_files)
test_tls = TfmdDL(test_files,
after_item=[Resize(224), ToTensor],
# bs=2
b = test_tls.one_batch()
calc_features(b[0], b[1])

That’s giving me results really close to the one from predict() and once even exactly the same one. I don’t get why they are still changing every time I run the code though, even if just slightly.

predict: (tensor(0), tensor(0), tensor([0.9978, 0.0022]))
manual: tensor([[0.9938, 0.0062]], device=‘cuda:0’, grad_fn=)
manual: tensor([[0.9950, 0.0050]], device=‘cuda:0’, grad_fn=)
manual: tensor([[0.9668, 0.0332]], device=‘cuda:0’, grad_fn=)
manual: tensor([[0.9667, 0.0333]], device=‘cuda:0’, grad_fn=)
manual: tensor([[0.9978, 0.0022]], device=‘cuda:0’, grad_fn=)
manual: tensor([[0.9838, 0.0162]], device=‘cuda:0’, grad_fn=)

BR, Chris

Not sure if it’s to do with normalisation, but if it is, I think you can normalize using Normalize.from_stats(*imagenet_stats)

Also, I wonder if your separate calling of the models onto your images are causing the params to change because they are still trainable? Try doing it in eval() mode. My post here shows what I did, though it’s in the process of me trying to hook in GradCAM.

Let us know how you get on. Thanks.


1 Like

just subtract the mean and divide the std from the image tensor, both can be seen in imagenet_stats

1 Like

after some (hours of … ;)) debugging and trying things I finally found out what went wrong.
fastai’s Resize() transform actually accepts a parameter named split_idx. It’s usually set internally by the DataLoader/TfmdLists which is why you never see it in any tutorials, and I had somehow managed to completely miss it in the documentation. :slightly_frowning_face:

When this parameter is not passed, funny things happen in the before_call method of class Resize:
self.pcts = (0.5,0.5) if split_idx else (random.random(),random.random())

A bit further down the code, in encodes(), self.pcts affects the cropping of the image!

When I finally noticed this during debugging I wasn’t sure whether to laugh or cry :stuck_out_tongue_winking_eye:

So, to conclude, this is how to correctly load both images and apply the transforms without the learner:
t1 = PILImage.create(i1)
t1 = Resize(224)(t1, split_idx=1)
t1 = ToTensor()(t1)
t1 = IntToFloatTensor()(t1)
t1 = torchvision.transforms.Normalize(*imagenet_stats)(t1)

t2 = PILImage.create(i2)
t2 = Resize(224)(t2, split_idx=1)
t2 = ToTensor()(t2)
t2 = IntToFloatTensor()(t2)
t2 = torchvision.transforms.Normalize(*imagenet_stats)(t2)

Then pass it through body and head to get the same result as with predict:
enc_body1 = learn.model.encoder(t1.unsqueeze(0))
enc_body2 = learn.model.encoder(t2.unsqueeze(0))
ftrs =[enc_body1, enc_body2], dim=1)
ftrs_final = learn.model.head(ftrs)
ftrs_final = torch.nn.functional.softmax(ftrs_final, dim=1)

Thanks again for the input to both of you, much appreciated!