Hi, I am trying to create a similar model as LSTM RNN from lesson 8 (course v4) but instead of using text input data, I want to feed in a sequence of images. My input data is grayscale images with the batch shape of: torch.Size([512, 1, 1, 128])
and I want to classify them (there are only two classes). I’m basically trying to combine CNN with LSTM. Here’s the code to build an LSTM from scratch from lesson 8 - Any suggestions on how to modify it so that it can process images?
class LMModel6(Module):
def __init__(self, vocab_sz, n_hidden, n_layers):
self.i_h = nn.Embedding(vocab_sz, n_hidden)
self.rnn = nn.LSTM(n_hidden, n_hidden, n_layers, batch_first=True)
self.h_o = nn.Linear(n_hidden, vocab_sz)
self.h = [torch.zeros(n_layers, bs, n_hidden) for _ in range(2)]
def forward(self, x):
res,h = self.rnn(self.i_h(x), self.h)
self.h = [h_.detach() for h_ in h]
return self.h_o(res)
def reset(self):
for h in self.h: h.zero_()
I see that self.i_h = nn.Embedding(vocab_sz, n_hidden)
has to be changed and I guess that a ResBlock
has to be put here instead but I don’t seem to find any answer on how to do it.
Thanks in advance folks!