Weight decay vs LR

ali_baba · November 15, 2020, 1:21pm

The goal of weight-decay (or most regularization) is not directly about reducing loss* but more specifically about attempting to avoid overfitting and therefore trying to develop a model that generalizes better to unseen data.

Are you asking about learning rate or L2 regularization for “LR”? As mentioned in this forum post it’s the same as L2 regularization. I won’t rehash @radek’s lovely explanation from that post though.

*although it’s a by product of well-trained model