and in the RNN lecture it was mentioned that we should initialize the hidden layer as the identity matrix. However in the case of my bidirectional RNN, when I do this:

model.rnn.weight_hh_l0.size()

I get a vector of size [512,128] (I am not sure where that 512 is coming from, I would have expected 128 x 2 = 256). How should I initialize the hidden state in this case? Thank you!

There are 4 components per cell. So its 4*hidden = 4*128 = 512

Cheers
A

I am copy-pasting bits from the nn.LSTM doc.

Attributes:
weight_ih_l[k] : the learnable input-hidden weights of the :math:\text{k}^{th} layer (W_ii|W_if|W_ig|W_io), of shape (4*hidden_size, input_size) for k = 0.
Otherwise, the shape is (4*hidden_size, num_directions * hidden_size)
weight_hh_l[k] : the learnable hidden-hidden weights of the :math:\text{k}^{th} layer (W_hi|W_hf|W_hg|W_ho), of shape (4*hidden_size, hidden_size)
bias_ih_l[k] : the learnable input-hidden bias of the :math:\text{k}^{th} layer (b_ii|b_if|b_ig|b_io), of shape (4*hidden_size)
bias_hh_l[k] : the learnable hidden-hidden bias of the :math:\text{k}^{th} layer (b_hi|b_hf|b_hg|b_ho), of shape (4*hidden_size)

If you know which of the weights you want to initialize by name, you can do something like this:

torch.nn.init.eye_(self.rnn.weight_hh_l0)

Definitely look into torch.nn.init —> it has some other builtin initializers like normal, uniform etc. (Btw, the underscore in eye_ implies an inplace transformation)

Also, please look up/play with module.named_parameters() e.g rnn.named_parameters() to programmatically iterate over the weights.

I don’t think the eye_ would work, because as far as I understand that creates an identity matrix, which should be a square matrix, while my matrix is 512 x 128. But I will look into torch.nn.init. Thank you!