Ok so I’ve spent entirely too many hours trying to get this to work. I’m appealing to smarter people than me!
I’m trying to get the LSTMCell from Chapter 12 to work in a LSTM Module. But I keep running into issues where my tensors aren’t the right sizes to stack. I’ve tried changing the shapes of my hidden and cell state, but then when I stack my hidden state with my input the tensor just keeps getting bigger. I’m hoping it’s obvious to someone else how to configure this?
Here are my classes:
bs=64 class LSTMCell(Module): def __init__(self, ni, nh): self.forget_gate = nn.Linear(ni + nh, nh) self.input_gate = nn.Linear(ni + nh, nh) self.cell_gate = nn.Linear(ni + nh, nh) self.output_gate = nn.Linear(ni + nh, nh) def forward(self, input, state): h,c = state # Stack our input with the previous hidden state h = torch.stack([h, input], dim=1) # Linear layer learns what to forget, then activated by sigmoid forget = torch.sigmoid(self.forget_gate(h)) # Since forget consists of scalars between 0 and 1, we multiply this result by the cell state # to determine which information to keep and which to throw away. Values close to 0 are thrown away # Values close to 1 are kept c = c * forget # Input gate combines with the cell gate to update the cell inp = torch.sigmoid(self.input_gate(h)) # Linear layer activated by tanh cell = torch.tanh(self.cell_gate(h)) # Cell state is updated by the results of the input gate times the cell gate c = c + inp * cell # Output gate determines which information from the cell state to use to generate output out = torch.sigmoid(self.output_gate(h)) # New hidden state is the results of the output gate combine with the tanh of the cell state h = out * torch.tanh(c) # Outputs the new hidden state along with the cell state. Seems redundant? return (h,c) class LSTM_scratch(Module): def __init__(self, vocab_sz, n_hidden): self.i_h = nn.Embedding(vocab_sz, n_hidden) self.rnn = LSTMCell(n_hidden, n_hidden) self.h_o = nn.Linear(n_hidden, vocab_sz) self.h = torch.zeros(bs, n_hidden) self.c = torch.zeros(bs, n_hidden) def forward(self, x): h,c = self.rnn.forward(self.i_h(x), (self.h, self.c)) self.h = h.detach() self.c = c.detach() return self.h_o(self.c) def reset(self): self.h.detach() self.c.detach() learn = Learner(dls, LSTM_scratch(len(vocab), 50), loss_func=CrossEntropyLossFlat(), metrics=accuracy, cbs=ModelResetter) learn.fit_one_cycle(5, 1e-2)
which gets me the error “RuntimeError: stack expects each tensor to be equal size, but got [64, 50] at entry 0 and [64, 16, 50] at entry 1”. So I think I’m getting confused about the dimensions of my hidden state and how to stack that with my input? I’ve tried making the dimension [64, 16, 50] to match with my input but then the next time I’m stacking I get a similar error trying to stack [64, 16, 50] with [64, 30, 50], which I don’t really understand either… I would have thought it would be [64, 32, 50] if anything.
Is it obvious to anyone how I’m butchering this? I’ve been stuck trying to understand this for days. I just want to move on!
Thanks a lot