Hello,
Going deeper into the Fast.ai library, I am looking at the fit methods, and especially the optimizers, callbacks and schedules that are behind a lot of fast.ai performance and ease of use.
I have a question regarding weight decay. I see as part of the stepper that the weight decay is manually applied to weights after gradient calculation:
if 'wd' in self.opt.param_groups[0] and self.opt.param_groups[0]['wd'] != 0:
#Weight decay out of the loss. After the gradient computation but before the step.
for group in self.opt.param_groups:
lr, wd = group['lr'], group['wd']
for p in group['params']:
if p.grad is not None: p.data = p.data.add(-wd * lr, p.data)
Isn’t it causing the weights decay to be applied twice, as when a standard Pytoorch optimizer is used (e.g. Adam), the step() method already performs weight decay (albeit with a different calculation) ?
Probably a stupid mistake on my sie, clarifications are very welcome!
Thanks