Understanding nn.Parameter

Cuong · August 9, 2019, 7:21am

Hello everyone, I am doing lesson2-sgd.ipynb and I do not make sense about the definition about nn.Parameter even though I have spent my time searching for it online. Can anyone explain it to me?
I really appreciate your help with my problem!

fgfm · August 11, 2019, 1:29pm

Hey @Cuong,

I actually had to dive a bit into this myself a few months ago as I was trying to pass a torch.Tensor to a function expecting a torch.nn.Parameter . So, data structures in deep learning are only used to articulate/manipulate the theoretical concept behind.

As for parameters, during optimization, you might know that we used both their values and their gradient. So you will need to have access to a parameter’s value for forward and backprop and its gradient, respectively accessible in pytorch for a parameter p using p.data and p.grad.

Here’s a short illustration:

import torch
# Let's define a simple data structure (1-dimensional tensor)
t = torch.tensor([1,2,3])
# This isn't enough for backprop as it only allows the storage of 3 torch.float values
# Fortunately PyTorch has what we need
p = torch.nn.Parameter(t)
# Now notice the difference between
print(t)
# And 
print(p)
# The values of the parameters are still accessible using:
print(p.data)

At this stage, without any forward or backward propagation, the p.grad will be empty for efficiency purposes.

Consider torch.nn.Parameter being an augmented version of torch.Tensor able to store both values and gradient of a parameter. For more details, I usually check the actual implementation since it’s open-source: https://pytorch.org/docs/stable/_modules/torch/nn/parameter.html
You can have the actual code on Github as well: https://github.com/pytorch/pytorch/blob/master/torch/nn/parameter.py but be aware it might include changes that occurred after the latest release (thus include changes since the version of PyTorch you are currently using).

Let me know if that helped, cheers!

Cuong · August 16, 2019, 7:06am

Thank you very much !

PalaashAgrawal · March 25, 2020, 8:33am

@fgfm I underwent the same problem you went through. Specifically, it was when I was trying to define my own linear function (myLinear) from Lesson 5 of Part 1. Initially, i was trying to initialise the weight ad bias tensors as torch.tensors. But that gave me an error, something like

runtimeerror: expected object of device type cuda but got device type cpu for argument #2 'mat2' in call to _th_mm

Later on, It worked fine as soon as I initialized the parameters as nn.Parameters.

So from your understanding, is it that the GPU just doesn’t accept anything other than nn.Parameters?
What is nn.Parameters in essence?
Thanks

fgfm · July 9, 2020, 9:51am

Hi @PalaashAgrawal,

My apologies, I haven’t been around for a while!
I would have to check your code more thoroughly but from the error you got:

the call to _th_mm points out that you’re trying to do matrix multiplication
the function was expecting a cuda tensor, but was provided a cpu tensor

My best guess is that, in your funciton there is an inplace operation on a CUDA tensor, where you’re trying to multiply it with a tensor that has not been moved to CUDA. I hope this helps!

Regarding nn.Parameters, you could see it as a pair of tensors:

what you would call its value would be the first tensor (the data attribute)
its gradient would be the second tensor (the grad attribute)
It is only a data structure used to keep track of the gradient of a tensor nearby, which is very useful for autograd, and optimization