PyTorch is really nice. I converted from Keras to PyTorch early 2017 and to be honest haven’t really looked back since. Anyhow, I’m trying to do an nlp project in Keras and am having some issues.
In PyTorch, Linear layers very conveniently have their weights initialized like this:
self.weight = Parameter(torch.Tensor(out_features, in_features))
Note how out_features curiously comes first(!). Embedding layers in PyTorch have weights that look like this:
self.weight = Parameter(torch.Tensor(num_embeddings, embedding_dim))
This “feature” makes it really easy to share / tie weights between embeddings and linear layers, just as fastai has done with their LinearDecoder by essentially just setting lineardecoder.linear.weights = rnnencoder.embedding.weights and letting PyTorch do the rest.
From my reading, the Keras paradigm to weight sharing is actually layer reuse w/ the functional api. Unfortunately, one cannot simply swap an ‘embedding’ and ‘dense’ layer. To further complicate, keras dense layers have their kernels defined as:
self.kernel = self.add_weight(shape=(input_dim, self.units), .....
So even if one could theoretically just assign weights from one layer to another (which we can’t in Keras), there would still be some issues like transposing.
Anyhow, my question is, have any of you or do any of you know how to share / tie weights between a Keras embedding and dense layer similar to how fastai has in their language models?