How to apply l2 regulazation in cnn_learner

Hello, this question is probably asked but i have been searching in the forum for a while and I couldn’t find it.
I am in the lesson 4 and when I try to apply my knowledge on a kaggle competition (i can’t get more data) my model overfits so fast, i tried to increase wieght decay but i don’t think it’s helping, what i want now is to apply l2 regulazation, I think it’s very efficient as learned in andrew ng deep learning, my understanding is that wd is equal to l2_reg in the case of SGD only and i am working with adam
.
image
thanks in advance

1 Like

My understanding is that original optimizer implementations of weight decay actually implemented L2 regularization. Then this paper came along and pointed out the error. So an optimizer like AdamW has weight decay and not L2 regularization. If you use regular PyTorch Adam, it would implement L2 regularization (even though it incorrectly calls it weight_decay). Instead the PyTorch AdamW is Adam with weight decay. Fastai has its own implementation of Adam, where decouple_wd=True (default) gives you weight decay, while setting it to False gives you L2 regularization. Here are the relevant docs:

Hope that clears things up.

1 Like

So with me using directly with the cnn_learner like in the image:
learn.fine_tune(10,wd=0.1)
I am using 0.1 as lambda l2_reg parameter, right?

also one final note can you recommend any range of lambda to start with?

fastai’s Adam by default uses weight decay. You need to pass a different version of Adam with the decouple_wd=False:

learn = cnn_learner(dls, resnet18, metrics=[accuracy], opt_func = partial(Adam, decouple_wd=False)

Keep in mind this -
PyTorch:

  • Adam has L2 reg.
  • AdamW has weight decay

fastai:

  • Adam has weight decay
  • Adam with decouple_wd=False has L2 reg

Is that clearer?

Read the documentation as well for more info.

4 Likes

Thank you, that was very helpful

1 Like