Hi, everyone! I got a problem when trying to understand Kaiming initialize, the derivation process of Kaiming initialize domenstrated that the initial wants the variance of yl or delta x to be neither too large nor too small. But why? I mean, if the variance is too large, maybe yl or delta x will be too large, but if the variance is small, why yl or delta x will be small?

Any help will be apperciate!

Hi TM

I think the idea is that small variances disappear to zero and large to infinity so the idea of Kai Ming is to keep them under control to avoid rounding errors near zero or large numbers becoming NaN. Please remember floating point numbers are an approximation of the real number.

Regards Conwyn