Ok so I’ve spent entirely too many hours trying to get this to work. I’m appealing to smarter people than me!
I’m trying to get the LSTMCell from Chapter 12 to work in a LSTM Module. But I keep running into issues where my tensors aren’t the right sizes to stack. I’ve tried changing the shapes of my hidden and cell state, but then when I stack my hidden state with my input the tensor just keeps getting bigger. I’m hoping it’s obvious to someone else how to configure this?
Here are my classes:
bs=64
class LSTMCell(Module):
def __init__(self, ni, nh):
self.forget_gate = nn.Linear(ni + nh, nh)
self.input_gate = nn.Linear(ni + nh, nh)
self.cell_gate = nn.Linear(ni + nh, nh)
self.output_gate = nn.Linear(ni + nh, nh)
def forward(self, input, state):
h,c = state
# Stack our input with the previous hidden state
h = torch.stack([h, input], dim=1)
# Linear layer learns what to forget, then activated by sigmoid
forget = torch.sigmoid(self.forget_gate(h))
# Since forget consists of scalars between 0 and 1, we multiply this result by the cell state
# to determine which information to keep and which to throw away. Values close to 0 are thrown away
# Values close to 1 are kept
c = c * forget
# Input gate combines with the cell gate to update the cell
inp = torch.sigmoid(self.input_gate(h))
# Linear layer activated by tanh
cell = torch.tanh(self.cell_gate(h))
# Cell state is updated by the results of the input gate times the cell gate
c = c + inp * cell
# Output gate determines which information from the cell state to use to generate output
out = torch.sigmoid(self.output_gate(h))
# New hidden state is the results of the output gate combine with the tanh of the cell state
h = out * torch.tanh(c)
# Outputs the new hidden state along with the cell state. Seems redundant?
return (h,c)
class LSTM_scratch(Module):
def __init__(self, vocab_sz, n_hidden):
self.i_h = nn.Embedding(vocab_sz, n_hidden)
self.rnn = LSTMCell(n_hidden, n_hidden)
self.h_o = nn.Linear(n_hidden, vocab_sz)
self.h = torch.zeros(bs, n_hidden)
self.c = torch.zeros(bs, n_hidden)
def forward(self, x):
h,c = self.rnn.forward(self.i_h(x), (self.h, self.c))
self.h = h.detach()
self.c = c.detach()
return self.h_o(self.c)
def reset(self):
self.h.detach()
self.c.detach()
learn = Learner(dls, LSTM_scratch(len(vocab), 50),
loss_func=CrossEntropyLossFlat(),
metrics=accuracy, cbs=ModelResetter)
learn.fit_one_cycle(5, 1e-2)
which gets me the error “RuntimeError: stack expects each tensor to be equal size, but got [64, 50] at entry 0 and [64, 16, 50] at entry 1”. So I think I’m getting confused about the dimensions of my hidden state and how to stack that with my input? I’ve tried making the dimension [64, 16, 50] to match with my input but then the next time I’m stacking I get a similar error trying to stack [64, 16, 50] with [64, 30, 50], which I don’t really understand either… I would have thought it would be [64, 32, 50] if anything.
Is it obvious to anyone how I’m butchering this? I’ve been stuck trying to understand this for days. I just want to move on!
Thanks a lot