Weight initialization: best practice

What are folks’ experience in choosing model weight initializations? There are some research indicating good initialization are keys for neural network to converge, was this discussed in later lessons (I just finished lesson3)? If so, can anyone point me to the right lesson?

@xinxin.li.seattle No, I don’t think the different weight initializations are discussed in detail, most of the course is focused on transfer learning and hence weight initialization is not a big issue. There is a short discussion in one of the lessons (don’t remember which now, sorry) on how batch normalization makes initialization less important.
You can read a little in here:

1 Like

Thank you! I just remembered and revisited weight initialization discussion in lesson 2 (1:07:54 - Initialization).

Also, I had some “insight” during my research. It took me a couple of hours to realize, but it’s so dumb I just want to put it out there for those who are curious.

Xavier weight initialization is 2/(n_in + n_out). In Keras, when you create a new Convolution2D, the default weight initialization is “glorot_uniform”, this is the same initialization as “xavier” in Caffe. Why? Because this is a method from this 1998 paper by Xavier Glorot & Yoshua Bengio’s! So, Xavier in caffe, Glorot_uniform in Keras, the same thing. Uhhhhhh

1 Like