Going deeper into the Fast.ai library, I am looking at the fit methods, and especially the optimizers, callbacks and schedules that are behind a lot of fast.ai performance and ease of use.
I have a question regarding weight decay. I see as part of the stepper that the weight decay is manually applied to weights after gradient calculation:
if 'wd' in self.opt.param_groups and self.opt.param_groups['wd'] != 0: #Weight decay out of the loss. After the gradient computation but before the step. for group in self.opt.param_groups: lr, wd = group['lr'], group['wd'] for p in group['params']: if p.grad is not None: p.data = p.data.add(-wd * lr, p.data)
Isn’t it causing the weights decay to be applied twice, as when a standard Pytoorch optimizer is used (e.g. Adam), the step() method already performs weight decay (albeit with a different calculation) ?
Probably a stupid mistake on my sie, clarifications are very welcome!