Hello, I’m trying to add regularization to the loss function and have confusion at some points.
- First I noticed that weight decay is used as default in Adam optimizer. So, just want to confirm, the command like
learn_test = Learner(dls, model, opt_func=Adam, loss_func=F.mse_loss, metrics=F.mse_loss)
will give a weight decay of 0.01. Is it correct.
Then if I want to change the weight decay to l1 regularization, probabaly the code would be
learn_test = Learner(dls, model, opt_func=partial(Adam, decouple_wd=False), loss_func=F.mse_loss, metrics=F.mse_loss) ?
- But then what about adding l1 regularization? Is their a quick way or I need to write the loss function by myself? Also, the problem is that if I add a l1 regularization term in the loss function, do I need to “shut down” the l2 regularization in Adam optimizer? Because according to the documentation, the l2 is built in by default and therefore there would be two regularization together?