Unit Variance

SamJoel · June 5, 2020, 1:52pm

Kaiming Initialisation comes with two parameters ‘a’ and ‘b’ which are sqrt(5) and sqrt(3) by default respectively. But the formula seems to be

std = b*sqrt(2/(1 + a**2)*fan_in)

But if we need a std of 1.0 then ‘a’ in the above equation would be

a = sqrt((2/fan_in) - 1) (* by solving the above equation by substituting std = 1)

which means our std would become 1 but this term

sqrt(2/(1 + a**2)*fan_in) would become 1…

which implies that our weights are completely initialised uniformly between the limits ( -b , b ) which means all my layers are uniformly initialised between these (-b, b) which leads to Neural Network symmetry which leads to worse convergence.

I tried this experimentally it does converge extremely bad so DOES UNIT VARIANCE REALLY LEADS TO BETTER CONVERGENCE??? or am I wrong somewhere???