Lecture 5 - Question about init coeffs(weights)- Why divide the coeffs value by number of layers

Is it necessary to divide by sizes[i+1] in the line below

`layers = [(torch.rand(sizes[i], sizes[i+1])-0.3)/sizes[i+1]*4 for i in range(n-1)]`

if I am not wrong the numbers are going to be between centered around 0 with the same magnitude if you don’t divide it. Also how did we arrive at multiplying that by 4 (sizes[i+1]*4)

def init_coeffs():
    hiddens = [10, 10]  # <-- set this to the size of each hidden layer you want
    sizes = [n_coeff] + hiddens + [1]
    n = len(sizes)
    layers = [(torch.rand(sizes[i], sizes[i+1])-0.3)/sizes[i+1]*4 for i in range(n-1)]
    consts = [(torch.rand(1)[0]-0.5)*0.1 for i in range(n-1)]
    for l in layers+consts: l.requires_grad_()
    return layers,consts

I guess it’s for “normalization” purposes (the same thing as subtracting 0.3 from the random tensor), but I’m not entirely sure and would be also interested in the answer.

2 Likes

interested as well