Why do we need to divide by the 1 over the square root of the variance in order to bring down the variance?

Do you mean the first part?

X = torch.randn(32,64)
a = torch.tensor(4.)

assert (a * X).var() == a**2 * X.var()
assert (a.sqrt() * X).var() == a * X.var()
1 Like