RNN initialization of the hidden layer

Hello! I have this LSTM:

n_hidden = 128
n_classes = 2
bs = 1
class TESS_LSTM(nn.Module):
    def __init__(self, nl):
        self.nl = nl
        self.rnn = nn.LSTM(1, n_hidden, nl, bidirectional=True) #dropout=0.3,bidirectional=True)
        self.l_out = nn.Linear(n_hidden*2, n_classes)
    def forward(self, input):
        outp,h = self.rnn(input.view(len(input), bs, -1), self.h)
        #self.h = repackage_var(h)
        return F.log_softmax(self.l_out(outp),dim=2)
    def init_hidden(self, bs):
        self.h = (V(torch.zeros(self.nl*2, bs, n_hidden)),
                  V(torch.zeros(self.nl*2, bs, n_hidden)))

and in the RNN lecture it was mentioned that we should initialize the hidden layer as the identity matrix. However in the case of my bidirectional RNN, when I do this:


I get a vector of size [512,128] (I am not sure where that 512 is coming from, I would have expected 128 x 2 = 256). How should I initialize the hidden state in this case? Thank you!

There are 4 components per cell. So its 4*hidden = 4*128 = 512


I am copy-pasting bits from the nn.LSTM doc.

weight_ih_l[k] : the learnable input-hidden weights of the :math:\text{k}^{th} layer
(W_ii|W_if|W_ig|W_io), of shape (4*hidden_size, input_size) for k = 0.
Otherwise, the shape is (4*hidden_size, num_directions * hidden_size)
weight_hh_l[k] : the learnable hidden-hidden weights of the :math:\text{k}^{th} layer
(W_hi|W_hf|W_hg|W_ho), of shape (4*hidden_size, hidden_size)
bias_ih_l[k] : the learnable input-hidden bias of the :math:\text{k}^{th} layer
(b_ii|b_if|b_ig|b_io), of shape (4*hidden_size)
bias_hh_l[k] : the learnable hidden-hidden bias of the :math:\text{k}^{th} layer
(b_hi|b_hf|b_hg|b_ho), of shape (4*hidden_size)

Oh I see. Thank you! So is there any good way to initialize this?

If you know which of the weights you want to initialize by name, you can do something like this:


Definitely look into torch.nn.init —> it has some other builtin initializers like normal, uniform etc. (Btw, the underscore in eye_ implies an inplace transformation)

Also, please look up/play with module.named_parameters() e.g rnn.named_parameters() to programmatically iterate over the weights.


I don’t think the eye_ would work, because as far as I understand that creates an identity matrix, which should be a square matrix, while my matrix is 512 x 128. But I will look into torch.nn.init. Thank you!