Initialization method of CNN

Shirui · October 18, 2019, 1:09pm

Generally, CNN(Convolutional neural network) is composed of two parts:

convolution layer
fully connected layer

My question is,
Should Initialization method for CNN(convolution layer + fully connected layer) be the same?

In my opinion,
there may be a good initialization method for the part of convolution layer,
and another good initialization method for the part of fully connected layer.
(I don’t know if this is the case.)

The same method or different, which method should I take?
And what initialization method is good for CNN?

KarlH · October 18, 2019, 7:51pm

Kaiming initialization should be fine for both. Kaiming initialization will scale the distribution your initial weights are drawn from by the size of the weight matrix, so the convolutional kernels and the linear layers will automatically pull from different distributions.

Shirui · October 19, 2019, 5:43am

Thanks for your advice.
I try to use that method.

wyquek · October 19, 2019, 8:57am

Hmm…despite everything said in Lesson 9, pytorch still uses a=math.sqrt(5) in pytorch/conv.py as follows:

    def reset_parameters(self):
        n = self.in_channels
        init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

which would give activation outputs a variance much lower than 1 if we are using a Relu after the CNN layer. Wonder if there is a need to replace the a=math.sqrt(5) with a= 0 manually?

jhunt · October 22, 2019, 4:03am

I find for CNNs if I build architectures with residual connections that the default initialization does not work so well. The loss will start at 10+ sometimes even in the hundreds instead of around 3. Any advise on how to initialize when you are adding layers together like that?