True weight decay (mulitplying weights by 1 - wd*lr in OptimWrapper) must be applied to only the trainable parameters i.e. when we are training the head (body frozen), weight decay is not applied to the body parameters. Is this correct?
If only trainable parameters get true weight decay where is this enforced in fastai (sorry I got lost when digging through the source code)
My understanding is the optimizer wrapper handles in order to do it (or not do it) before the regular PyTorch Adam optimizer is called. wd is for the optim wrapper, and weight_decay is what gets passed to PyTorch (only one can be set of course, not both).
Thanks for the pointers and the info Less. I will dig into this more. I also thought what you had mentioned. My first dive into this before posting the query on the forums seemed to suggest (most likely my misunderstanding ) that true_wd was being applied to the body as well. I was using the oxford-iiit lesson-01 ipynb in part1-v3 (where the head is being trained with ResNet34). Now, I am sure that this a misunderstanding on my part. I will dive again.
Thanks again for your reply.