I was following CNN Interpretation with CAM, in order to use activations maps with ResNet-50, rather than ResNet-34 as in the tutorial. Images have shape 32x32.
When I go through this step:
cam_map = torch.einsum('ck,kij->cij', learn.model[-1].weight, act)
I get this error:
RuntimeError: einsum(): operands do not broadcast with remapped shapes [original->remapped]: [10, 512]->[10, 1, 1, 512] [2048, 1, 1]->[1, 1, 1, 2048]
How could I calculate the corect shapes in this case?
Facing the same problem
This is because in the case of ResNet-50, the convolutional part outputs a vector of length 2048, which now cannot be used as is in the
einsum, which expects the same length as the final layer, i.e. 512. There are two ways I can think of for you to solve the problem.
Either you change the position of the hook to another convolutional layer, and use a layer that has 512 output channels. This will probably not be as accurate as using the final layer as you now rely on lower level features to plot the CAM.
einsumat all, instead plot the average of the activations given by the hook. The main advantage of the
einsummethod is that you will have as many
cam_mapas you have classes, so you can plot them separately (i.e. which part of the image makes my model think it is from class X). By doing an average, the resulting image you will get shows: which part of the image made my model predict what it predicted.
The second option is what I would advise you. To do it, just replace
cam_map in the plotting function by
Hope it helps !