Hello! I built an LSTM and I wanted to test it on a very small sample, to make sure it can overfit it. Here is (part of) my code:
n_hidden = 16 n_classes = 2 bs = 1 class TESS_LSTM(nn.Module): def __init__(self, nl): super().__init__() self.nl = nl self.rnn = nn.LSTM(1, n_hidden, nl, dropout=0.1, bidirectional=True) self.l_out = nn.Linear(n_hidden*2, n_classes) self.init_hidden(bs) def forward(self, input): outp,h = self.rnn(input.view(len(input), bs, -1), self.h) #self.h = repackage_var(h) return F.log_softmax(self.l_out(outp),dim=2) def init_hidden(self, bs): self.h = (V(torch.zeros(self.nl*2, bs, n_hidden)), V(torch.zeros(self.nl*2, bs, n_hidden))) model = TESS_LSTM(2).cuda() loss_function = nn.NLLLoss() optimizer = optim.Adam(model.parameters(), lr=0.0005) for epoch in range(5551): model.zero_grad() tag_scores = model(trn_x) loss = loss_function(tag_scores.reshape(len(trn_x),n_classes), trn_y.reshape(len(trn_y))) loss.backward() optimizer.step()
In this case the trn_x and trn_y have 100 inputs each (I tried with 10 and got the same result). So here is the problem. If I comment out the
self.h = repackage_var(h) line, the code works fine and after enough iterations I am able to reproduce the exact input. If I don’t comment it out, the LSTM doesn’t work (the probabilities for my 2 output classes remain around 50% each, no matter what I do). As far as I understand from the lectures, the purpose of repackage_var is to make sure that at the end of one iteration, the hidden state is saved, but its history is not, which I would totally need when using the actual (long) data set. So why is it not working, in my case, when I use repackage_var? Thank you!