Hello, this question is probably asked but i have been searching in the forum for a while and I couldn’t find it.
I am in the lesson 4 and when I try to apply my knowledge on a kaggle competition (i can’t get more data) my model overfits so fast, i tried to increase wieght decay but i don’t think it’s helping, what i want now is to apply l2 regulazation, I think it’s very efficient as learned in andrew ng deep learning, my understanding is that wd is equal to l2_reg in the case of SGD only and i am working with adam
.
thanks in advance
My understanding is that original optimizer implementations of weight decay actually implemented L2 regularization. Then this paper came along and pointed out the error. So an optimizer like AdamW has weight decay and not L2 regularization. If you use regular PyTorch Adam, it would implement L2 regularization (even though it incorrectly calls it weight_decay). Instead the PyTorch AdamW is Adam with weight decay. Fastai has its own implementation of Adam
, where decouple_wd=True
(default) gives you weight decay, while setting it to False
gives you L2 regularization. Here are the relevant docs:
Hope that clears things up.
So with me using directly with the cnn_learner like in the image:
learn.fine_tune(10,wd=0.1)
I am using 0.1 as lambda l2_reg parameter, right?
also one final note can you recommend any range of lambda to start with?
fastai’s Adam by default uses weight decay. You need to pass a different version of Adam with the decouple_wd=False
:
learn = cnn_learner(dls, resnet18, metrics=[accuracy], opt_func = partial(Adam, decouple_wd=False)
Keep in mind this -
PyTorch:
- Adam has L2 reg.
- AdamW has weight decay
fastai:
- Adam has weight decay
- Adam with
decouple_wd=False
has L2 reg
Is that clearer?
Read the documentation as well for more info.
Thank you, that was very helpful