Lesson 8 readings: Xavier and Kaiming initialization

simonjhb · March 29, 2019, 4:23pm

Another nice way to get some intuition about what @maxim.pechyonkin is saying is to look at the ‘All you need is a good init’ paper https://arxiv.org/abs/1511.06422 which takes an empirical approach to initialization. Instead of deriving formulas for the initializations of the weights in terms of the parameters of the network architecture, the authors just determine the appropriate scale for the weights by experiment. They feed a batch of inputs through the network layer by layer and scale the weights of each layer to ensure the output always has variance close to 1. The appeal of this approach is that it means you don’t continually have to think about different rules for initialization as you develop new architectures, you can just set them algorithmically.

I’ve also attempted an implementation of it here: https://forums.fast.ai/t/implementing-the-empirical-initialization-from-all-you-need-is-a-good-init/42284