So I was confused by one particular line in the code

```
class Seq2SeqRNN(nn.Module):
def __init__(self, vecs_enc, itos_enc, em_sz_enc, vecs_dec, itos_dec, em_sz_dec, nh, out_sl, nl=2):
### lots of code
self.out.weight.data = self.emb_dec.weight.data
### code continues
```

I was confused how both the linear layer and embedding layer can share weights as they have different shapes

The linear layer is **nn.Linear(300,len(en_itos))** whereas embedding is **nn.Embedding(len(en_itos),300)** .

So I went and inspected both their sizes.

It turns out

```
nn.Embedding(17573,300).weight.data and
nn.Linear(300,17573).weight.data
```

both have the same size - **[torch.cuda.FloatTensor of size 17573x300] .**

That’s when I realized that when we are doing matrix multiplication we do WX+b (duh)

So W will have the shape (17573,300) . But why does embedding layer have the same shape? Because it isn’t a matrix. **nn.Embedding is a lookup table.** . You just query a word (one of the 17573 in this case ) and get back a 300 dim vector. That’s why Jeremy could tie both output embedding and output linear layer weights.

It all sounds simple to me in hindsight, but I was stumped by this for a while. So I hope this helps anybody who was confused about weight sharing in this code