Hello everyone, I am doing lesson2-sgd.ipynb and I do not make sense about the definition about nn.Parameter even though I have spent my time searching for it online. Can anyone explain it to me?
I really appreciate your help with my problem!
Hey @Cuong,
I actually had to dive a bit into this myself a few months ago as I was trying to pass a torch.Tensor
to a function expecting a torch.nn.Parameter
. So, data structures in deep learning are only used to articulate/manipulate the theoretical concept behind.
As for parameters, during optimization, you might know that we used both their values and their gradient. So you will need to have access to a parameter’s value for forward and backprop and its gradient, respectively accessible in pytorch for a parameter p
using p.data
and p.grad
.
Here’s a short illustration:
import torch
# Let's define a simple data structure (1-dimensional tensor)
t = torch.tensor([1,2,3])
# This isn't enough for backprop as it only allows the storage of 3 torch.float values
# Fortunately PyTorch has what we need
p = torch.nn.Parameter(t)
# Now notice the difference between
print(t)
# And
print(p)
# The values of the parameters are still accessible using:
print(p.data)
At this stage, without any forward or backward propagation, the p.grad
will be empty for efficiency purposes.
Consider torch.nn.Parameter
being an augmented version of torch.Tensor
able to store both values and gradient of a parameter. For more details, I usually check the actual implementation since it’s open-source: https://pytorch.org/docs/stable/_modules/torch/nn/parameter.html
You can have the actual code on Github as well: https://github.com/pytorch/pytorch/blob/master/torch/nn/parameter.py but be aware it might include changes that occurred after the latest release (thus include changes since the version of PyTorch you are currently using).
Let me know if that helped, cheers!
Thank you very much !
@fgfm I underwent the same problem you went through. Specifically, it was when I was trying to define my own linear function (myLinear) from Lesson 5 of Part 1. Initially, i was trying to initialise the weight ad bias tensors as torch.tensors. But that gave me an error, something like
runtimeerror: expected object of device type cuda but got device type cpu for argument #2 'mat2' in call to _th_mm
Later on, It worked fine as soon as I initialized the parameters as nn.Parameters.
So from your understanding, is it that the GPU just doesn’t accept anything other than nn.Parameters?
What is nn.Parameters in essence?
Thanks
Hi @PalaashAgrawal,
My apologies, I haven’t been around for a while!
I would have to check your code more thoroughly but from the error you got:
- the call to
_th_mm
points out that you’re trying to do matrix multiplication - the function was expecting a cuda tensor, but was provided a cpu tensor
My best guess is that, in your funciton there is an inplace operation on a CUDA tensor, where you’re trying to multiply it with a tensor that has not been moved to CUDA. I hope this helps!
Regarding nn.Parameters
, you could see it as a pair of tensors:
- what you would call its value would be the first tensor (the
data
attribute) - its gradient would be the second tensor (the
grad
attribute)
It is only a data structure used to keep track of the gradient of a tensor nearby, which is very useful for autograd, and optimization