Kaiming Initialisation comes with two parameters ‘a’ and ‘b’ which are sqrt(5) and sqrt(3) by default respectively. But the formula seems to be

`std = b*sqrt(2/(1 + a**2)*fan_in)`

But if we need a std of 1.0 then ‘a’ in the above equation would be

`a = sqrt((2/fan_in) - 1)`

(* by solving the above equation by substituting std = 1)

which means our std would become 1 but this term

`sqrt(2/(1 + a**2)*fan_in)`

would become 1…

which implies that our weights are completely initialised uniformly between the limits ( -b , b ) which means all my layers are uniformly initialised between these (-b, b) which leads to Neural Network symmetry which leads to worse convergence.

I tried this experimentally it does converge extremely bad so DOES UNIT VARIANCE REALLY LEADS TO BETTER CONVERGENCE??? or am I wrong somewhere???