Lesson 8 readings: Xavier and Kaiming initialization

bny6613 · March 27, 2019, 8:49pm

I was looking into pytorch implementation of these initializations and found that in order to calculate the bounds of uniform distribution, they multiply the standard deviation by square root of 3.

fan_in, fan_out = _calculate_fan_in_and_fan_out(tensor)
std = gain * math.sqrt(2.0 / (fan_in + fan_out))
a = math.sqrt(3.0) * std  # Calculate uniform bounds from standard deviation
with torch.no_grad():
    return tensor.uniform_(-a, a)

I wonder where that sqrt(3) came from. I can’t find anything about this relationship between normal and uniform distribution. I will appreciate if someone explain that or point in right direction.