Lesson 6 Advanced Discussion ✅

Would it be correct to say based on the fast.ai v2 lessons last year a hook is like an interface in a language like java or protocol for a language like swift where by different classes implement them for different behaviour like in this case cleaning up activations or maybe doing something different, or am I missing the point

preds[0, int(cat)].backward() calculates the gradients of the score of category cat with respect to the backbone’s last convolution layer weights. That gives you an idea of how much the weights of the last layer influence that particular category’s output score.

The layer weights are then global average pooled and multiplied by the activation map of the last convolutional layer, which you get as a result of the forward pass. Please see the GradCAM paper, jump to section 3 for the implementation details. This paper is actually not so hard to digest.

Trying to create heat maps for my classification results i ran
image
image

using the default parameters from Lesson 6
and got the following error message

What could be the problem?
Thanks
Moran

Hi,

As per the error message, the function “int” cannot take a MultiCategory object as input. I don’t think grad heatmaps work with multi-category classification anyways, since it wouldn’t be able to pick/combine activations from different categories to show?

Yijin

Thanks

The issue was coming from the cat=y part, make sure the y value is generated the same way the notebook did.
Hope it helps you.

I walk through the rossmann notebook in lesson 6 and got RMSPE of 0.116 on valid set, which is not bad as far as the best in leaderboard is around 0.10. However when I submit the results I only got 0.17 which is far away from the result I calculated on the validation set.

Was this also a type of overfitting? How to due with this problem when you got a good result on valid set but bad performance on test set?

hi Edward, this is great explanation. Many thanks!

hi Sango, follow up question… the 11x11 is just a toned downed version of the whole picture, not a part of the picture, correct? Thanks!

I want to ask the same question ! In a previous parts of the course (lesson 5 I believe) we have seen thumbnails of pictures and it seemed to me at that time that the model was breaking down images into small bits that could be visually hooked out.

After the convolution and especially the heatmap lesson, I understood the channels were actually always reflecting the whole input picture, a “toned down” version which, when averaged over all channels, created the heatmap.

Can anyone confirm this insight or resources to clarify it ?

Cheers

The deeper you go in your network, the higher the level of features that are learnt by the network, as explained by Jeremy. This isn’t possible if these activations were “a small/broken down part” of the input images, because they’d need to represent the features learnt from the entire image.

So yes, the activations are a “toned down” representation of the input. This “toned down” activation (although it has a lower pixel resolution) knows a great deal about what’s actually present in the image. Which is why you can generate heat maps etc. from it.