Why torch.no_grad() [lesson 9]

PalaashAgrawal · May 17, 2020, 3:05pm

While writing our update block:

for epoch in range(epochs):
    for i in range((n-1)//bs + 1):
        start_i = i*bs
        end_i = start_i+bs
        xb = x_train[start_i:end_i]
        yb = y_train[start_i:end_i]
        loss = loss_func(model(xb), yb)

        loss.backward()
        **with torch.no_grad():**
            for l in model.layers:
                if hasattr(l, 'weight'):
                    l.weight -= l.weight.grad * lr
                    l.bias   -= l.bias.grad   * lr
                    l.weight.grad.zero_()
                    l.bias  .grad.zero_()

Why do we write torch.no_grad() before updating our weights? And if we do write torch.no_grad(), why does l.weight.grad return a value?
TIA

dipam7 · May 17, 2020, 3:49pm

We use torch.no_grad() so that the optimizer does not calculate gradients for the lines of code that follow. This reduces memory usage and speeds up computation.
We are able to use l.weight.grad because the gradients were calculated before when we called loss.backward(). Now we just want to use them.

PalaashAgrawal · May 17, 2020, 4:10pm

@dipam7 Oh, i see. So does that mean, that, Pytorch calculates gradients simultaneously with every change in parameter(s) once loss.backward is typed, instead having to type loss.backward every time?

dipam7 · May 17, 2020, 6:05pm

Oh no, it does not. It does keep track of the changes though. Check the autograd tutorial in the PyTorch official documentation. All your doubts will be cleared