Kaiming Initialisation comes with two parameters ‘a’ and ‘b’ which are sqrt(5) and sqrt(3) by default respectively. But the formula seems to be
std = b*sqrt(2/(1 + a**2)*fan_in)
But if we need a std of 1.0 then ‘a’ in the above equation would be
a = sqrt((2/fan_in) - 1) (* by solving the above equation by substituting std = 1)
which means our std would become 1 but this term
sqrt(2/(1 + a**2)*fan_in) would become 1…
which implies that our weights are completely initialised uniformly between the limits ( -b , b ) which means all my layers are uniformly initialised between these (-b, b) which leads to Neural Network symmetry which leads to worse convergence.
I tried this experimentally it does converge extremely bad so DOES UNIT VARIANCE REALLY LEADS TO BETTER CONVERGENCE??? or am I wrong somewhere???