Things I don't understand while visualizing intermediate layers

When i google for visualizing intermediate layers in convolutional networks, I find lots of posts showing and explaining layers of the model which are pre-trained on Imagenet or similar huge dataset. After seeing this trend I thought may be we need to train the model to that certain level to get the visualization of the layers, like the visualizations from matt zeiler paper.

I simply want to know if its the case. When i created my own conv model with few conv layers and couple of dense layers in keras and when i trained that model on 2000 images. My visualizations are not that quite fascinating, at most part its blank and only some dots and stripes scattered all over the place.

I followed this notebook to visualize the layers. Here it uses mnist but in my case i used normal images with 3 channels.

Another thing that got me thinking is that, In keras blog there is another way to visualize the layers, similar to above technique but it also uses gradient ascent. I would like to know what are the differences in both techniques?

https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html

So, what should i do to get that fascinating visualizations like matt zeiler paper or the visualization Jeremy showed in Part 1 (heatmap) what are the criteria the model should have? Does it need to be trained on huge dataset, or many epochs required or both?

These two visualizations (heatmap and matt zeiler visualizations) they are kind of different as well. What is the case for that and what exactly they both are showing? From what i understood they are showing different activations on each layer of the input image. Why they look different? Even from above links the MNIST visualization is kind of understandable and intuitive whereas the keras blog visualization is interesting but got me wondering what is going on inside of those layers.

If its too stupid and I am not getting simple stuffs please forgive me and please help me understand them.

You can visualize the layers of any model, but the features will be much more basic if you train on a small dataset and/or for a short time. For example, I’m pretty sure there is no way you will get a beautiful ‘cat’ filter when training a small net from scratch on a few images.

The difference between the two methods you mentioned is that in one case you are doing gradient descent in the image space to create images that maximally activate a filter, while in the other (Zeiler & Fergus) you are looking at which existing validation images most strongly activate a filter.

1 Like

In one of the above methods, one uses gradient ascent whereas in another (MINIST example) there is no gradient ascent… what is the difference between the two?