Hey everyone,
There’s this not-so-new paper by Salimans et al “Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks”.
It seems something that I see no reason not to do, but I haven’t found a lot of discussion about it (it appears the same name is sometimes used for weight decay, of which there is a lot of discussion).
Does anybody here have experience with it/has it fallen out of fashion? Would love to hear your thoughts.