TL;DR
The idea of the colorful dimension is to express with colors the mean and standard deviation of activations for each batch during training. Vertical axis represents a group (bin) of activation values. Each column in the horizontal axis is a batch. The colours represent how many activations for that batch have a value in that bin.
NB: Actual plot shown by Jeremy in the lesson are much more curated and focused. This post is about the original idea and my attempt to implement it.
DISCLAIMER: in the chart above I’ve manually superimposed the mean on the colorful dimension, to give an intuition of how the charts works.
The final plot for each layer is made by stacking the histogram of the activations from each batch along the horizontal axis. So each vertical slice in the visualisation represents the histogram of activations for a single batch. The color intensity corresponds to the height of the histogram, in other words the number of activations in each histogram bin.
The notebook is the first attempt to solve the problem
FUTURE WORKS
Try to have an understanding of what is going to happen looking at the charts.
Adapt the original callback to current fast.ai version to make it available.
Publish all the flash cards
CREDITS
Thnx again to @jeremy for credit and to @simonjhb that helped me a lot on understanding the problem and transforming sketches into code
Thanks Stefano for bringing a ‘colorful’ dimension to our deep learning study! Your drawings are fantastic and it would be great if you could find a way to publish all of them as a learning resource.
Small suggestion: it would be great if you could add to the right of each plot a vertical “color bar” that shows the numerical value of the counts corresponding to each color bin. That way we could have an idea of how many activations are zero, etc.
The following picture shows the steps involved in the computation of histograms. All these steps are needed to create a **single chart column", such as the first column of the chart is the histogram of the first batch and so on.
GET A BATCH OF ACTIVATIONS: we usually analyze the activations after a specific layer/transformation. The generic shape is [BS,??]. For example the result of a 2d convolution is [BS,NF,XX,YY], and the result of a Linear layer is [BS,NF] where:
BS: batch size.
NF: number of features/filters/channels.
XX,YY: size of “image”.
COMPUTE HISTOGRAM: we compute the histogram of the whole activations batch. The parameters of our histogram are:
hMin,hMax: minimum and maximum values reported in our histogram.
nBins: number of bins, the “resolution” of the histogram.
LOG THE HISTOGRAM: Note that this transformation is made only for visual purpose and not applied to the stored histogram values.
Original Idea:
This was my very first sketch made to visualize the initial idea of the “colorful dimension”.
I’m working on a complete rewrite/improvement of the colorful dimension chart.
Here are some anticipation (this picture usually means bad training, but it’s cool )
data = ImageDataBunch.from_folder(untar_data(URLs.MNIST_SAMPLE),bs=1024)
# (1) Create custom ActivationsHistogram according to your needings
actsh = partial(ActivationsHistogram,modulesId=None,hMin=-10,hMax=10,nBins=200)
# Add it to the callback_fns
learn = cnn_learner(data, models.resnet18, callback_fns=actsh, metrics=[accuracy])
# Fit: and see the Twin Peaks chart in action
learn.fit_one_cycle(4)
# (2) Customize and Plot the colorful chart!
learn.activations_histogram.plotActsHist(cols=20,figsize=(30,15),showEpochs=False)
What do black and yellow lines mean in each layer?
For instance in the L2 Conv2d for epoch 2 and epoch 3 the magnitude distribution of activations far more than the specified display range of -12 to 12. Does it mean that such activations can cause gradient explosion during the backpropagation stage?
In L4 Con2d the magnitude distribution of activations starts to grow from iteration to iteration, does it mean that the network started to learn in later iterations?
What kind of insights should I get from this hist? What is a healthy color should be for each layer? How to analyze this diagram in general?
Is there a list of similar diagrams and tools to analyze neural nets’ performance which commonly used in the production environment for PyTorch / fast.ai infrastructure?