Hi I’m quite lost with weight decay and I would like to know how WD and LR work together and how is it that weight decay affect loss ? Isn’t loss , simply put just the diff between our actual and predicted data ? How does reducing our weight affect loss ? Sometimes we would need our weight to be increased for the loss to be better right ?
Also how does WD compare to LR