In the init of linear decoder:
def init(self, n_out, nhid, dropout, tie_encoder=None):
super().init(n_out, nhid, dropout)
if tie_encoder: self.decoder.weight = tie_encoder.weight
Does the final line this take a whole additional copy of the weights, creating a new variable? Or is the variable somehow linked?
I’m trying to understand when/how variables in pytorch are created and when there’s just some sort of soft linking and sharing going on.
Does anyone have a good resource for that question?
[ Edit: was wrong :), see below. ]
I think what’s happening here is that we’re not creating a new
Parameter, we’re just taking the one that was produced by
super (via the superclass
LinearRNNOutput's init) and replacing its data
tie_encoder's weight. Those weights are then tied in the sense that multiple variables/parameters are now pointing at the same underlying tensor.
What you’re implying though is that variable in the superclass (self.decoder.weight) is new, which if true means that it’s separate from the original embedding and the weights of the embedding are being stored twice.
I’m not certain this is the case though. Do you have any good resources on pytorch variable/memory allocation?
Ah, ok, I goofed above. I got confused about which things were variables, which were parameters, etc.
self.decoder is a
nn.Linear layer, so its
weight is a
Parameter; presumably the same goes for
tie_encoder.weight. So, setting
self.decoder.weight = tie_encoder.weight isn’t reseting the
decoder's weight tensor, it’s reseting its weight parameter. The call to
super initialized it with one weight parameter, and here we’re forgetting that one and replacing it with the tied one.
Does that answer your question? The linking/sharing happens just by sharing a reference.
Thanks, I think that was the missing link for me too. I need to read up further on parameters and variables. It seems like parameters can be explicitly shared, which is great, and that this is just happening by reference.
What I need to figure out now is if I want to capture a variable and store it how to best do that.
But it seems like the tie_encoder.weight isn’t adding any more memory requirements to the model, which was my main concern. I’m trying to build a model and I’m running out of memory, and I was hoping that by optimizing this I’d be able to reduce the amount of memory required.
Cool! The way I think about it, tensors are just dumb blocks of data, variables wrap tensors with fancy autograd capabilities, and parameters are almost exactly the same thing as variables: they just hook a little more easily into the
Yup you guys figured it out - they refer to the same object. Here’s the paper that this is implementing: https://arxiv.org/abs/1608.05859 . (Yes, that whole paper is basically one line of code )