Whether I take grad or grad.data, it doesn’t make any difference to the final value. Then why did authors use grad.data?
INPUT
print("lrparams = {}".format(lrparams))
print("lrparams.data = {}".format(lrparams.data))
print("lrparams.grad = {}".format(lrparams.grad))
print("lrparams.grad.data = {}".format(lrparams.grad.data))
…
OUTPUT
lrparams = tensor([ 1.8587e-05, -3.9449e-06, 7.6135e-06], grad_fn=)
lrparams.data = tensor([ 1.8587e-05, -3.9449e-06, 7.6135e-06])
lrparams.grad = tensor([0.9545, 0.0612, 0.0040])
lrparams.grad.data = tensor([0.9545, 0.0612, 0.0040])