As part of the new London Fast.ai study group, I prepared a tutorial about Batch Normalisation:
Batch Norm Tutorial + Demo
The tutorial covers the main points that Jeremy discusses as well as points learned from Aurelion Géron’s book as well as Andrew Ng.
Motivated to learned pywidgets
I also added a simplistic interactive demo is quite simplistic, based on Jeremy’s update
function in his SGD tutorial.
Feel free to comment, as I am sure that have much more to learn on the topic and how to best present it.
Just highlighting the Tutorial TL;DR here:
Batch Normalisation is a method used predominately to stabilise activations during training a neural network, which is accomplished by three operations: standardising, scaling and shifting.
This is done by introducing a layer that has four parameters.
Two trained parameters: \gamma (scaling) \beta (shifting)
Two estimated parameters: \mu (mean), \sigma (standard deviation) (both used for standardising)
Other resulting benefits is faster learning and usage as a regulariser to reduce overfitting.
The faster speed is a result of layers are less dependent on each other and free to learn more quickly.