Batch Norm Tutorial and Demo

As part of the new London Fast.ai study group, I prepared a tutorial about Batch Normalisation:
Batch Norm Tutorial + Demo

The tutorial covers the main points that Jeremy discusses as well as points learned from Aurelion Géron’s book as well as Andrew Ng.

Motivated to learned pywidgets I also added a simplistic interactive demo is quite simplistic, based on Jeremy’s update function in his SGD tutorial.

Feel free to comment, as I am sure that have much more to learn on the topic and how to best present it.

Just highlighting the Tutorial TL;DR here:
Batch Normalisation is a method used predominately to stabilise activations during training a neural network, which is accomplished by three operations: standardising, scaling and shifting.

This is done by introducing a layer that has four parameters.
Two trained parameters: \gamma (scaling) \beta (shifting)
Two estimated parameters: \mu (mean), \sigma (standard deviation) (both used for standardising)

Other resulting benefits is faster learning and usage as a regulariser to reduce overfitting.

The faster speed is a result of layers are less dependent on each other and free to learn more quickly.