No with torch.no_grad() in optim.SGD.step

zkylord · August 5, 2020, 4:28pm

I was digging in optim.SGD.step and was expecting with torch.no_grad(), but it is absent!
instead, I found

    def step(self, closure=None):
        """Performs a single optimization step.

        Arguments:
            closure (callable, optional): A closure that reevaluates the model
                and returns the loss.
        """
        loss = None
        if closure is not None:
            loss = closure()

        for group in self.param_groups:
            weight_decay = group['weight_decay']
            momentum = group['momentum']
            dampening = group['dampening']
            nesterov = group['nesterov']

            for p in group['params']:
                if p.grad is None:
                    continue
                d_p = p.grad.data
                if weight_decay != 0:
                    d_p.add_(weight_decay, p.data)
                if momentum != 0:
                    param_state = self.state[p]
                    if 'momentum_buffer' not in param_state:
                        buf = param_state['momentum_buffer'] = torch.zeros_like(p.data)
                        buf.mul_(momentum).add_(d_p)
                    else:
                        buf = param_state['momentum_buffer']
                        buf.mul_(momentum).add_(1 - dampening, d_p)
                    if nesterov:
                        d_p = d_p.add(momentum, buf)
                    else:
                        d_p = buf

                p.data.add_(-group['lr'], d_p)

        return loss

Does it mean that using “.data” is like just getting values of grad/parameters and can be used instead of with torch.no_grad. If so why is this way better?

kushaj · August 5, 2020, 5:33pm

.data is available for backwards compataibility in pytorch. It was used earlier (the same time optim.SGD was written). It is not recommended to use .data in pytorch, instead use torch.no_grad.

PalaashAgrawal · August 5, 2020, 5:35pm

@zkylord
.data() is a reference to the value of a variable/tensor in Pytorch, and any operation done on .data is NOT tracked by PyTorch’s autograd. I actually don’t know if there is any advantage of this over with torch.no_grad(). Maybe someone else may be able to answer that.
Regards

PalaashAgrawal · August 5, 2020, 5:37pm

Other than if .data is just removed from Pytorch in the future, as @kushaj pointed out.

zkylord · August 5, 2020, 6:19pm

Thanks @PalaashAgrawal & @kushaj