How should various kinds of layers be treated, from gradual unfreezing to weight decay and anything else?

I know this topic is discussed piecemeal in various places, but wondering if there is a single resource that clearly shows, for each kind of layer (bias, batch norm, layer norm, etc…), what the best practices are re things like:

  1. Should weight-decay be applied (or any other hyperparameter based on layer type)?

  2. Should the layer always be marked as trainable even in gradual unfreezing?

  3. Should be enabled/disabled in different phases (training v. eval)?

Maybe this can even be wikified and resources linked too for each question above (along with folks being able to add other questions)???


I always love the idea of singular threads I can bookmark. Wikified, have at it folks! :slight_smile: