The formula for kaiming normal is
(2 / ((1 + a ** 2) * fan_in))) ** .5
I understand the reasoning on why ‘a’ is needed for the leaky relu case, but why not just keep it abs(a) instead of a ** 2?
Seems to me like a ** 2 would be less accurate in scaling the weights to give a standard deviation of about 1, after the pass through the weights and abs(a) will be more accurate.
Am I missing something?