I noticed that for a pretrained ResNet50 model, the mean of the layers are not zero and the standard deviation is not close to 1. Would it help to reinitialize the network with LSUV, or would that destroy all the information from using the pretrained model?
I can’t help answer but wanted to clarify for other readers LSUV = layer-sequential unit variance and is the initialization proposed in the “All you need is a good init paper” (I didn’t know and was curious).
Yes, and to further clarify, this is discussed in lesson 9 of Part 2. Implementation over here
Good question! I think, just try it. If I get around to it, I will and will let you know the results.
Reinitializing your network (with LSUV or with any other initialization technique) means resetting the value of all the weights in the network. This would defeat the object of loading pre-trained weights in the first place.
Careful initialization (with standard normal distributions) is intended to help with the training process - to avoid gradient collapse and explosion. When you start from a pre-trained network you are unlikely to be radically changing the weights (if you are radically changing them you are losing the information from the pre-training anyway) and therefore gradient collapse and explosion won’t become a problem.
The only layer that will need to be initialised is the final layer that gets added at the end of the network to ensure that the pre-trained network outputs an appropriate number of activations for your particular problem. For example if you were using a network pre-trained on ImageNet to accomplish a binary classification task the final layer of the pre-trained net that has 1000 outputs would be removed and replaced with a layer that has only two outputs - this layer would be initialized randomly.