Visualizing intermediate layers a la Zeiler and Fergus

Its attached in the resources section and some did plot the intermediate activations for a particular class… Check the Share your work thread

1 Like

Check out this repo, even other projects have visualisation in them:


Sure it’s here:

Thanks to @aakashns . I haven’t watched any myself BTW yet. Let me know if you check them out whether you find it useful.


So I took a crack at visualizing activations by optimization. I didn’t get the pretty pictures that the Google folks got, but I did get a cool implementation of deep dream going. Check it out if you’re interested.


This lecture describes the different approaches described in your link


Edit: See new post below for a working the single image prediction notebook.

After a coding session with @ramon about getting the activations with hooks I hacked together a small notebook (L1-stonefly_activations.ipynb) to visualize the different network layer activations:

To get the activations I used the following hook function (adapted from Jeremy):

class StoreHook(HookCallback):
    def on_train_begin(self, **kwargs):
        self.acts = []
    def hook(self, m, i, o): return o
    def on_train_end(self, train, **kwargs): self.acts = self.hooks.stored

I am not sure if I used the hook correctly?
The image dimension in the results is strange, as I only have 34 images (the dataset has 3000+)?
I also could not figure out how to get the original image from the data loader to compare them to the activations.
Maybe there is a much easier way to get the activations?

The notebook above is based on my previous post and was inspired by the notebook from @KarlH (thank you, learned a lot!).

Kind regards


What is the purpose of m and i, since the function nevers uses them? Thanks!

Glad you found the notebook helpful.

The activations you get from the forward hook are generated every time you run something through the model, so you only have the activations for a single batch. When you run a new batch, the old forward hooks are replaced. I think that since you’re running the hook function as a callback, the activations you actually get out are the activations from the final batch of your validation dataset, which likely has 34 images in it.

I think you’ll find getting activations for specific images is easier if you do it outside the training loop. You can load a specific image or images of interest and just pass those. If you want multiple batches worth of activations you’ll have to loop through a dataloader and save activations for each batch. If you do this, remember to dump them to the CPU or you’ll run out of GPU memory real fast.


@MicPie, @KarlH . I’m reading the Hooks callbacks also but it’s quite hard to understand. Do you know where is this part in the dev nb ?

More specifically, in the class Hooks.

class Hook():
"Create a hook."
def __init__(self, m:nn.Module, hook_func:HookFunc, is_forward:bool=True):
    self.hook_func,self.stored = hook_func,None
    f = m.register_forward_hook if is_forward else m.register_backward_hook
    self.hook = f(self.hook_fn)
    self.removed = False

def hook_fn(self, module:nn.Module, input:Tensors, output:Tensors):
    input  = (o.detach() for o in input ) if is_listy(input ) else input.detach()
    output = (o.detach() for o in output) if is_listy(output) else output.detach()
    self.stored = self.hook_func(module, input, output)

def remove(self):
    if not self.removed:

Why do we have self.hook_func = hook_func but also defining def hook_fn() . what is the purpose of these 2 ? they have the same name. Sorry because it took me a while to really understand this part so I’m appreciated if someone can help me on it.

Thank you in advance,

After asking the question, I searched a little bit and have a simple answer.

  • The dev nb for hook is in 005a_interpretation. But it doesn’t have many information
  • To undertand how hook works in pure Pytorch: pls find here the example - hook
  • The hook_fn(self, module:nn.Module, input:Tensors, output:Tensors) is just the syntax to use hook in Pytorch. The real function we define in hook_func:HookFunc

I will continue to read about this part, always appreciated if someone can show me good resources about this :smiley: Thank you in advance

p/s: Actually, after reading the PCA technique to explain the last layer nodes (in Share your work), I think about an experiment. I will try to put zero each node in the last layers, finding it will affect which categories in the results. Then we can understand what this node represent. How do you think about this ?


I don’t yet fully understand the hook class used in v1.0 yet. I’m still using stuff I learned from a previous iteration of the course. I use

class SaveFeatures():
    def __init__(self, m): self.hook = m.register_forward_hook(self.hook_fn)
    def hook_fn(self, module, input, output): self.features = output
    def remove(self): self.hook.remove()

Then I populate a list with SaveFeatures objects for each layer in the model I want activations from

sfs = [SaveFeatures(m[i]) for i in range(len(m))]

Then when you run something through m like p = m(x), each element in sfs is populated with activations from its corresponding layer.

One thing to note is that sometimes you need to get fancy with indexing because models are not always structured linearly. For example the model used in Lesson 1 has two layer groups accessible by indexing - one layer for the resnet34 backbone and one layer for the custom head. If you want to get activations from the resnet block you need to specifically index into it.

sfs = [SaveFeatures(children(m)[0][i]) for i in range(len(children(m)[0]))]

It is the signature of hook in pytorch. I found this in the description.

The hook will be called every time after :func:`forward` has computed an output.
It should have the following signature::

    hook(module, input, output) -> None

So we don’t call it directly but the hook function. At each forward, it will extract the model, input, output and put in this function.


Dear @dhoa,

I now created a notebook with single image prediction and activation visualization based on your code from your post:

With the flatten_model function it is easy to get the layers of interest and the hook gets installed by calling hook_outputs(layers).

Where did you find the flatten_model and the other parts of the code snippet so I can dive a little deeper into this topic?
I guess the callback is not needed for getting the activations and is for more advanced operations or am I wrong?
If somebody has more information/sample code/etc. on this topic I would be very interested. :smiley:



very happy to know that my code is useful.

The source code of flatten_model is : flatten_model = lambda m: sum(map(flatten_model,m.children()),[]) if num_children(m) else [m]

And I found it by looking at the in /fastai class HookCallback(LearnerCallback):

def on_train_begin(self, **kwargs):
    if not self.modules:
        self.modules = [m for m in flatten_model(self.learn.model)
                        if hasattr(m, 'weight')]
    self.hooks = Hooks(self.modules, self.hook)

It indicates that if modules are not specified then choose all the modules which have weight in the model. So for yours interest, you just need to choose the last linear layer. I got familiar with these things by playing around with the ActivationStats and model_sizes in

I think for getting just the activations, you don’t need callback. The hook_outputs is enough. Actually, I haven’t played with the callback yet :smiley: still not familiar yet with the the method in callback.

I’m very appreciated if someone can give more examples of this topic too


Using the callback should make your code a little simpler. Amongst other things, it can automatically remove your hook for you when done.


Thanks Jeremy. Can i add the callback after the model have already trained ? Because in this case I train the model first then try to get the activation in the validation set

Sure - there are params for callbacks in get_preds and validate, amongst others.


Hey, if useful I implemented Grad-CAM using fastai 1.0; allows to see which parts of the image are “used” by the network to make a prediction. It’s a variation on CAM using averaging with gradients.

This might also be interesting to some of you because it has gradient/backward hooks as well as the output_hooks.


Example result:


Very cool. Any chance we can tackle the improved grad-cam++

They have a TensorFlow implementation on github…

1 Like

Yes ! I’d love to try that out (also the guided variants, because Grad-Cam’s resolution is awful for the application I tried, sat imagery).