I am reading chapter 16, and I have a question about momentum. The book explains the steps as the following:

```
weight.avg = beta * weight.avg + (1-beta) * weight.grad
```

However, the code for getting `average_grad`

is

```
def average_grad(p, mom, grad_avg=None, **kwargs):
if grad_avg is None: grad_avg = torch.zeros_like(p.grad.data)
return {'grad_avg': grad_avg*mom + p.grad.data}
```

which does not have `(1-beta)`

. Is this intentional? I tried training 10 epochs with `(1-beta)`

and without, and I found `(1-beta)`

worked better. I also looked at the source code for fastai, but it is missing `(1-beta)`

.

```
def average_grad(p, mom, dampening=False, grad_avg=None, **kwargs):
"Keeps track of the avg grads of `p` in `state` with `mom`."
if grad_avg is None: grad_avg = torch.zeros_like(p.grad.data)
damp = 1-mom if dampening else 1.
grad_avg.mul_(mom).add_(p.grad.data, alpha=damp)
return {'grad_avg': grad_avg}
```

Does anybody know whether this is intentional or a mistake?

If it is intentional, does anybody know why?