Solved: Help with implementing Grad-CAM at inference time

cwerner · November 12, 2018, 12:47pm

Hi.

I’m trying to implement Grad-CAM as nicely shown by @henripal and @MicPie here:
https://forums.fast.ai/t/visualizing-intermediate-layers-a-la-zeiler-and-fergus/28140/25?u=cwerner

@henripal is using the learner to access an image batch


next(iter(learn.data.train_dl))

Now, in my case I don’t have that anymore, I load the model and do inference on an image (till now with predict()) like so:

MODEL = 'stage-2.pth'
path = Path("/tmp")
data = ImageDataBunch.single_from_classes(path, labels, tfms=get_transforms(max_warp=0.0), size=299).normalize(imagenet_stats)
learn = create_cnn(data, models.resnet50)

learn.model.load_state_dict(
    torch.load("models/%s" % MODEL, map_location="cpu")
)

I setup the hooks as suggested in the post, but I fail to make it work with my setup… How do I go from my Image file to a tensor that fits the model (ResNet-50, 299px)?

In the example out = learn.model( img_tensor )is used …

I somehow need to go from my image to a tensor in the right format… Any kind soul know where I should look?

henripal · November 12, 2018, 1:29pm

Can you share a minimum working example NB?

cwerner · November 12, 2018, 1:41pm

Yeah.

I’ll clean up my mess a bit and link it…

C

cwerner · November 12, 2018, 1:54pm

Here it is…

I’m not sure about the image loading (I did a hack, but this is not what I think is right).
Also, it’s failing with a dimensionality error at the end… Not sure if my hook is wrong or if I target the wrong layer or it’s another issue altogether…?

gist.github.com

https://gist.github.com/cwerner/c22127d7fce0f9665d020b0f795c5037

inference_gradcam-Copy1.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Guitar Classifier - Inference and cam\n",
    "\n",
    "model and image: https://www.dropbox.com/sh/wy5uhhxmxghxci0/AADbPnM5-6PcJ1GvRlI6nbJOa?dl=0 \n",
    "rough draft, based on:  \n",

This file has been truncated. show original

The model and test image are at this dropbox link:
https://www.dropbox.com/sh/wy5uhhxmxghxci0/AADbPnM5-6PcJ1GvRlI6nbJOa?dl=0

henripal · November 12, 2018, 3:47pm

Here’s what I did to make it work:

Making sure that the tensor transformation is the correct one by just adding your image to the dataset
then grabbing it from the dataloader

learn.data.valid_ds.set_item(img)
tensor_img = list(learn.dl())[0][0]
out = learn.model(tensor_img)

I’m not super sure but I think the adaptive pooling is messing with my hardcoded dimensions so I corrected the reshapes as follows:

_, n, w, h = gradients.shape
fmaps = fmap_hook.stored.cpu().numpy().reshape(n, w, h) # reshape activations

Here’s the full working example here:

gist.github.com

https://gist.github.com/henripal/d3d6181633682b220d41f63079972682

guitar_gradcam.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Guitar Classifier - Inference and cam\n",
    "\n",
    "model and image: https://www.dropbox.com/sh/wy5uhhxmxghxci0/AADbPnM5-6PcJ1GvRlI6nbJOa?dl=0 \n",
    "rough draft, based on:  \n",

This file has been truncated. show original

cwerner · November 12, 2018, 5:00pm

Oh wow. Super cool. Thanks for that…

Another silly question if you don’t mind. I try to get the original image to overlay it with the grad-cam result…

However, I’m not sure if I calculate it correctly from the tensor:

t = np.transpose(tensor_img.squeeze(), (1, 2, 0))
print(t.min(), t.max())
plt.imshow((t - t.min())/t.max())

The colors seem to be off. Not sure if the is actually a transformation or if I do not de-normalize correctly? Is there a more elegant solution for this?

henripal · November 12, 2018, 5:03pm

Your max is different after doing t - t.min() I think - you probably have some pixels with >1.0 values

cwerner · November 12, 2018, 5:26pm

Sure thing. This is correct:

t = (t - t.min()) / (t.max() - t.min())

Thanks again for helping out

cwerner · November 12, 2018, 9:18pm

As a followup, here is the updated version:

gist.github.com

https://gist.github.com/cwerner/31273b62b461df4bc515ee19358e4401

inference_and_gradcam.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Guitar Classifier - Inference and cam\n",
    "\n",
    "rough draft, based on:  \n",
    "https://github.com/henripal/maps/blob/master/nbs/big_resnet50-interpret-gradcam.ipynb\n",

This file has been truncated. show original

Unfortunately I am stuck again. In the second gist I try to get grad-CAMs for all 11 classes…

However, the CAMs I get look wrong. I guess something is up with the way I store backdrop results?

gist.github.com

https://gist.github.com/cwerner/3588c1d31e6d36917ec6c5be756b9f2f

inference_and_gradcam_bad.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Guitar Classifier - Inference and cam\n",
    "\n",
    "rough draft, based on:  \n",
    "https://github.com/henripal/maps/blob/master/nbs/big_resnet50-interpret-gradcam.ipynb\n",

This file has been truncated. show original

cwerner · November 12, 2018, 11:31pm

I think I got it now…

Updated Gist

gist.github.com

https://gist.github.com/cwerner/2f2d3109d0475fd6839b47c60b02cdf3

Guitar_GuidedGradCAM.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Guitar Classifier - Grad-CAM Analysis\n",
    "\n",
    "For unknown reasons the model get corrupted when cells a run twice. Thus, if you want to repeat the output first restart the kernel. If you know why this happens let me know!\n",
    "\n",

This file has been truncated. show original