In the 07a_lsuv.ipynb (lesson 11) notebook the LSUV initialization technique is implemented in 2 loops - the first for the mean and the second for the std. deviation, like this:
while mdl(xb) is not None and abs(h.mean) > 1e-3: m.bias -= h.mean
while mdl(xb) is not None and abs(h.std-1) > 1e-3: m.weight.data /= h.std
When the mean and variance of the layers is examined the std is very close to 1 but the means aren’t quite 0. In the notebook for the example network it gives values like this:
I’ve also played a bit with that awesome tweak, and figured our you can run those loops 2 times in a row and also get near-zero means.
while mdl(xb) is not None and abs(h.mean) > 1e-3: m.bias -= h.mean
while mdl(xb) is not None and abs(h.std-1) > 1e-3: m.weight.data /= h.std
while mdl(xb) is not None and abs(h.mean) > 1e-3: m.bias -= h.mean
while mdl(xb) is not None and abs(h.std-1) > 1e-3: m.weight.data /= h.std
Or if you just switch the order of the normalization, and normalize the std first and then the mean, you also get means close to zero and stds close to one:
# while mdl(xb) is not None and abs(h.mean) > 1e-3: m.bias -= h.mean
while mdl(xb) is not None and abs(h.std-1) > 1e-3: m.weight.data /= h.std
while mdl(xb) is not None and abs(h.mean) > 1e-3: m.bias -= h.mean