The formula for kaiming normal is

`(2 / ((1 + a ** 2) * fan_in))) ** .5`

I understand the reasoning on why ‘a’ is needed for the leaky relu case, but why not just keep it abs(a) instead of a ** 2?

Seems to me like a ** 2 would be less accurate in scaling the weights to give a standard deviation of about 1, after the pass through the weights and abs(a) will be more accurate.

Am I missing something?