FastAI Library Question about LinearDecoder

Even · January 19, 2018, 5:15am

In the init of linear decoder:

def init(self, n_out, nhid, dropout, tie_encoder=None):
super().init(n_out, nhid, dropout)
if tie_encoder: self.decoder.weight = tie_encoder.weight

Does the final line this take a whole additional copy of the weights, creating a new variable? Or is the variable somehow linked?

I’m trying to understand when/how variables in pytorch are created and when there’s just some sort of soft linking and sharing going on.

Does anyone have a good resource for that question?

cqfd · January 19, 2018, 11:13am

[ Edit: was wrong :), see below. ]
I think what’s happening here is that we’re not creating a new Variable/Parameter, we’re just taking the one that was produced by super (via the superclass LinearRNNOutput's init) and replacing its data Tensor with tie_encoder's weight. Those weights are then tied in the sense that multiple variables/parameters are now pointing at the same underlying tensor.

Even · January 19, 2018, 5:04pm

What you’re implying though is that variable in the superclass (self.decoder.weight) is new, which if true means that it’s separate from the original embedding and the weights of the embedding are being stored twice.

I’m not certain this is the case though. Do you have any good resources on pytorch variable/memory allocation?

cqfd · January 19, 2018, 5:52pm

Ah, ok, I goofed above. I got confused about which things were variables, which were parameters, etc.

self.decoder is a nn.Linear layer, so its weight is a Parameter; presumably the same goes for tie_encoder.weight. So, setting self.decoder.weight = tie_encoder.weight isn’t reseting the decoder's weight tensor, it’s reseting its weight parameter. The call to super initialized it with one weight parameter, and here we’re forgetting that one and replacing it with the tied one.

Does that answer your question? The linking/sharing happens just by sharing a reference.

Even · January 19, 2018, 6:19pm

Thanks, I think that was the missing link for me too. I need to read up further on parameters and variables. It seems like parameters can be explicitly shared, which is great, and that this is just happening by reference.

What I need to figure out now is if I want to capture a variable and store it how to best do that.

But it seems like the tie_encoder.weight isn’t adding any more memory requirements to the model, which was my main concern. I’m trying to build a model and I’m running out of memory, and I was hoping that by optimizing this I’d be able to reduce the amount of memory required.

cqfd · January 19, 2018, 6:26pm

Cool! The way I think about it, tensors are just dumb blocks of data, variables wrap tensors with fancy autograd capabilities, and parameters are almost exactly the same thing as variables: they just hook a little more easily into the nn.Module machinery.

jeremy · January 20, 2018, 5:57am

Yup you guys figured it out - they refer to the same object. Here’s the paper that this is implementing: https://arxiv.org/abs/1608.05859 . (Yes, that whole paper is basically one line of code )