Lecture 5 - Question about init coeffs(weights)- Why divide the coeffs value by number of layers

Is it necessary to divide by sizes[i+1] in the line below

`layers = [(torch.rand(sizes[i], sizes[i+1])-0.3)/sizes[i+1]*4 for i in range(n-1)]`

if I am not wrong the numbers are going to be between centered around 0 with the same magnitude if you don’t divide it. Also how did we arrive at multiplying that by 4 (sizes[i+1]*4)

def init_coeffs():
    hiddens = [10, 10]  # <-- set this to the size of each hidden layer you want
    sizes = [n_coeff] + hiddens + [1]
    n = len(sizes)
    layers = [(torch.rand(sizes[i], sizes[i+1])-0.3)/sizes[i+1]*4 for i in range(n-1)]
    consts = [(torch.rand(1)[0]-0.5)*0.1 for i in range(n-1)]
    for l in layers+consts: l.requires_grad_()
    return layers,consts

I guess it’s for “normalization” purposes (the same thing as subtracting 0.3 from the random tensor), but I’m not entirely sure and would be also interested in the answer.


interested as well