Solved: Help with implementing Grad-CAM at inference time


I’m trying to implement Grad-CAM as nicely shown by @henripal and @MicPie here:

@henripal is using the learner to access an image batch


Now, in my case I don’t have that anymore, I load the model and do inference on an image (till now with predict()) like so:

MODEL = 'stage-2.pth'
path = Path("/tmp")
data = ImageDataBunch.single_from_classes(path, labels, tfms=get_transforms(max_warp=0.0), size=299).normalize(imagenet_stats)
learn = create_cnn(data, models.resnet50)

    torch.load("models/%s" % MODEL, map_location="cpu")

I setup the hooks as suggested in the post, but I fail to make it work with my setup… How do I go from my Image file to a tensor that fits the model (ResNet-50, 299px)?

In the example out = learn.model( img_tensor )is used …

I somehow need to go from my image to a tensor in the right format… Any kind soul know where I should look?

1 Like

Can you share a minimum working example NB?


I’ll clean up my mess a bit and link it…


Here it is…

I’m not sure about the image loading (I did a hack, but this is not what I think is right).
Also, it’s failing with a dimensionality error at the end… Not sure if my hook is wrong or if I target the wrong layer or it’s another issue altogether…?

The model and test image are at this dropbox link:

Here’s what I did to make it work:

  1. Making sure that the tensor transformation is the correct one by just adding your image to the dataset
    then grabbing it from the dataloader
tensor_img = list(learn.dl())[0][0]
out = learn.model(tensor_img)
  1. I’m not super sure but I think the adaptive pooling is messing with my hardcoded dimensions so I corrected the reshapes as follows:
_, n, w, h = gradients.shape
fmaps = fmap_hook.stored.cpu().numpy().reshape(n, w, h) # reshape activations

Here’s the full working example here:


Oh wow. Super cool. Thanks for that…

Another silly question if you don’t mind. I try to get the original image to overlay it with the grad-cam result…

However, I’m not sure if I calculate it correctly from the tensor:

t = np.transpose(tensor_img.squeeze(), (1, 2, 0))
print(t.min(), t.max())
plt.imshow((t - t.min())/t.max())

The colors seem to be off. Not sure if the is actually a transformation or if I do not de-normalize correctly? Is there a more elegant solution for this?

Your max is different after doing t - t.min() I think - you probably have some pixels with >1.0 values

1 Like

Sure thing. This is correct:

t = (t - t.min()) / (t.max() - t.min())

Thanks again for helping out

1 Like

As a followup, here is the updated version:

Unfortunately I am stuck again. In the second gist I try to get grad-CAMs for all 11 classes…

However, the CAMs I get look wrong. I guess something is up with the way I store backdrop results?

I think I got it now…

Updated Gist