Hi,
I did a quick experiment to see if I could understand what the hidden state
in an LSTM does…
I tried to make an LSTM predict a sequence of [1,0,1,0,1…] based off an input sequence of X
with X[0] = 1
and the remainder as random noise.
X = [1, randFloat, randFloat, randFloat...]
label = [1, 0, 1, 0...]
In my head, the model would understand:
- The inputs
X
mean nothing, or at least very little (as it’s noise) - so it’d discard these values for the most part - Solely the
hidden state
from the previous sequence/timestepn
would be used to predict the next timestepn+1
… [1, 0, 1, 0…] - I also set
X[0] = 1
so the first initial in an attempt to guide the net to predicting 1 on the first item (which it does)
So, this didn’t work. In theory, should it not? Can you someone explain?
## Code
import os
import numpy as np
import torch
from torchvision import transforms
from torch import nn
from sklearn import preprocessing
from util import create_sequences
import torch.optim as optim
Create some fake data
sequence_1 = torch.tensor(np.random.uniform(size=50)).float().detach()
sequence_1[0] = 1
sequence_2 = torch.tensor(np.random.uniform(size=50)).float().detach()
sequence_2[0] = 1
labels_1 = np.zeros(50)
labels_1[::2] = 1
labels_1 = torch.tensor(labels_1, dtype=torch.long)
labels_2 = labels_1.clone()
training_data = [sequence_1, sequence_2]
label_data = [labels_1, labels_2]
Create simple LSTM Model
class LSTM(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(LSTM, self).__init__()
self.lstm = nn.LSTM(input_dim, hidden_dim)
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, seq):
lstm_out, _ = self.lstm(seq.view(len(seq), 1, -1))
out = self.fc(lstm_out.view(len(seq), -1))
out = F.log_softmax(out, dim=1)
return out
We try to overfit on the dataset
INPUT_DIM = 1
HIDDEN_DIM = 6
model = LSTM(INPUT_DIM, HIDDEN_DIM, 2)
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
for epoch in range(500):
for i, seq in enumerate(training_data):
labels = label_data[i]
model.zero_grad()
scores = model(seq)
loss = loss_function(scores, labels)
loss.backward()
print(loss)
optimizer.step()
with torch.no_grad():
seq_d = training_data[0]
tag_scores = model(seq_d)
for score in tag_scores:
print(np.argmax(score))