Hello! I built an LSTM and I wanted to test it on a very small sample, to make sure it can overfit it. Here is (part of) my code:
n_hidden = 16
n_classes = 2
bs = 1
class TESS_LSTM(nn.Module):
def __init__(self, nl):
super().__init__()
self.nl = nl
self.rnn = nn.LSTM(1, n_hidden, nl, dropout=0.1, bidirectional=True)
self.l_out = nn.Linear(n_hidden*2, n_classes)
self.init_hidden(bs)
def forward(self, input):
outp,h = self.rnn(input.view(len(input), bs, -1), self.h)
#self.h = repackage_var(h)
return F.log_softmax(self.l_out(outp),dim=2)
def init_hidden(self, bs):
self.h = (V(torch.zeros(self.nl*2, bs, n_hidden)),
V(torch.zeros(self.nl*2, bs, n_hidden)))
model = TESS_LSTM(2).cuda()
loss_function = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.0005)
for epoch in range(5551):
model.zero_grad()
tag_scores = model(trn_x)
loss = loss_function(tag_scores.reshape(len(trn_x),n_classes),
trn_y.reshape(len(trn_y)))
loss.backward()
optimizer.step()
In this case the trn_x and trn_y have 100 inputs each (I tried with 10 and got the same result). So here is the problem. If I comment out the self.h = repackage_var(h)
line, the code works fine and after enough iterations I am able to reproduce the exact input. If I don’t comment it out, the LSTM doesn’t work (the probabilities for my 2 output classes remain around 50% each, no matter what I do). As far as I understand from the lectures, the purpose of repackage_var is to make sure that at the end of one iteration, the hidden state is saved, but its history is not, which I would totally need when using the actual (long) data set. So why is it not working, in my case, when I use repackage_var? Thank you!