Ch. 16. Mistake or Intentional with weight average?

meinkappa · October 16, 2021, 3:29am

I am reading chapter 16, and I have a question about momentum. The book explains the steps as the following:

weight.avg = beta * weight.avg + (1-beta) * weight.grad

However, the code for getting average_grad is

def average_grad(p, mom, grad_avg=None, **kwargs):
    if grad_avg is None: grad_avg = torch.zeros_like(p.grad.data)
    return {'grad_avg': grad_avg*mom + p.grad.data}

which does not have (1-beta) . Is this intentional? I tried training 10 epochs with (1-beta) and without, and I found (1-beta) worked better. I also looked at the source code for fastai, but it is missing (1-beta) .

def average_grad(p, mom, dampening=False, grad_avg=None, **kwargs):
    "Keeps track of the avg grads of `p` in `state` with `mom`."
    if grad_avg is None: grad_avg = torch.zeros_like(p.grad.data)
    damp = 1-mom if dampening else 1.
    grad_avg.mul_(mom).add_(p.grad.data, alpha=damp)
    return {'grad_avg': grad_avg}

Does anybody know whether this is intentional or a mistake?
If it is intentional, does anybody know why?