The goal of weight-decay (or most regularization) is not directly about reducing loss* but more specifically about attempting to avoid overfitting and therefore trying to develop a model that generalizes better to unseen data.
Are you asking about learning rate or L2 regularization for “LR”? As mentioned in this forum post it’s the same as L2 regularization. I won’t rehash @radek’s lovely explanation from that post though.
*although it’s a by product of well-trained model