Why torch.no_grad() [lesson 9]

While writing our update block:

for epoch in range(epochs):
    for i in range((n-1)//bs + 1):
        start_i = i*bs
        end_i = start_i+bs
        xb = x_train[start_i:end_i]
        yb = y_train[start_i:end_i]
        loss = loss_func(model(xb), yb)

        loss.backward()
        **with torch.no_grad():**
            for l in model.layers:
                if hasattr(l, 'weight'):
                    l.weight -= l.weight.grad * lr
                    l.bias   -= l.bias.grad   * lr
                    l.weight.grad.zero_()
                    l.bias  .grad.zero_()

Why do we write torch.no_grad() before updating our weights? And if we do write torch.no_grad(), why does l.weight.grad return a value?
TIA

We use torch.no_grad() so that the optimizer does not calculate gradients for the lines of code that follow. This reduces memory usage and speeds up computation.
We are able to use l.weight.grad because the gradients were calculated before when we called loss.backward(). Now we just want to use them.

@dipam7 Oh, i see. So does that mean, that, Pytorch calculates gradients simultaneously with every change in parameter(s) once loss.backward is typed, instead having to type loss.backward every time?

Oh no, it does not. It does keep track of the changes though. Check the autograd tutorial in the PyTorch official documentation. All your doubts will be cleared :slight_smile:

1 Like