How should various kinds of layers be treated, from gradual unfreezing to weight decay and anything else?

wgpubs · June 20, 2020, 8:32pm

I know this topic is discussed piecemeal in various places, but wondering if there is a single resource that clearly shows, for each kind of layer (bias, batch norm, layer norm, etc…), what the best practices are re things like:

Should weight-decay be applied (or any other hyperparameter based on layer type)?
Should the layer always be marked as trainable even in gradual unfreezing?
Should be enabled/disabled in different phases (training v. eval)?

Maybe this can even be wikified and resources linked too for each question above (along with folks being able to add other questions)???

muellerzr · June 20, 2020, 8:38pm

I always love the idea of singular threads I can bookmark. Wikified, have at it folks!