How to understand “individual neurons are the basis directions of activation space”?

In a recent article at Distill link about visualizing internal representation of convolutional neural networks, there is the following passage (bold is mine):

If neurons are not the right way to understand neural nets, what is? In real life, combinations of neurons work together to represent images in neural networks. Individual neurons are the basis directions of activation space, and it is not clear that these should be any more special than any other direction.

Szegedy et al.[11] found that random directions seem just as meaningful as the basis directions. More recently Bau, Zhou et al.[12] found basis directions to be interpretable more often than random directions. Our experience is broadly consistent with both results; we find that random directions often seem interpretable, but at a lower rate than basis directions.

I feel like they are talking about linear algebra representations, but struggle to understand how one neuron can represent a basis vector.

So at this point I have 2 main questions:

  1. A neuron has only a scalar output, so how that can be a basic direction?

  2. What is an activation space and how to intuitively think about it?

I feel like understanding these can really broaden my intuition about internal geometry of neural nets. Can someone please help by explaining or point me in the direction of understanding internal processes of neural nets from the linear algebra point of view?


I got the following answer here:

My intuition would be: If you have a hidden layer with e.g. 10 neurons, then the activations of these 10 neurons span a 10-dimensional space. “Individual neurons are the basis directions of activation space” then means something like “the 10 states where exactly one neuron is 1 and the others are 0 are unit vectors that span this ‘activation space’”. But obviously, any independent set of 10 vectors spans the same space. And since a fully-connected layer is basically just a matrix product with the output of the previous layer, there’s no obvious reason why these unit vectors should be special in any way.
This is important if you try to visualize what this hidden layer represents: Who says that “neuron 3” or the state “neuron 3 is active and the other neurons are 0” even does represent anything? It’s equally possible that “neurons 2,3 and 5 are 1, neuron 7 is -2 and the others are 0” has a visual representation, but the unit vectors do not.

I think this answer clarifies things and makes it for another interesting linear-algebraic point of view on neural nets. We can think of them as geometric spaces flowing through transformations to arrive at the final prediction.

Now I have the following comments to the answer:

  1. Let’s say we have 3 neurons, this means our activation space is 3D space of real numbers. Then, each neuron’s output only represents on dimension on that space, and therefore we can say it represents a direction, like x, y and z. This makes sense now. Because each neuron’s outputs then are weighted and added up, this is really a linear combination. Then we can say all these individual directions will span activation space.

  2. Can we also say these directions are orthogonal? In the sense that dot product of two possible activation vectors is zero. Does it make any practical sense to think about orthogonality of different activations?


Please read this wonderful post that provides additional intuition.

1 Like