Simple LSTM Experiment to Predict Pattern 010101… Understanding Hidden State

Hi,

I did a quick experiment to see if I could understand what the hidden state in an LSTM does…

I tried to make an LSTM predict a sequence of [1,0,1,0,1…] based off an input sequence of X with X[0] = 1 and the remainder as random noise.

X = [1, randFloat, randFloat, randFloat...]
label = [1, 0, 1, 0...]

In my head, the model would understand:

  1. The inputs X mean nothing, or at least very little (as it’s noise) - so it’d discard these values for the most part
  2. Solely the hidden state from the previous sequence/timestep n would be used to predict the next timestep n+1… [1, 0, 1, 0…]
  3. I also set X[0] = 1 so the first initial in an attempt to guide the net to predicting 1 on the first item (which it does)

So, this didn’t work. In theory, should it not? Can you someone explain?


## Code
import os
import numpy as np
import torch

from torchvision import transforms
from torch import nn
from sklearn import preprocessing
from util import create_sequences
import torch.optim as optim

Create some fake data

sequence_1 = torch.tensor(np.random.uniform(size=50)).float().detach()
sequence_1[0] = 1
sequence_2 = torch.tensor(np.random.uniform(size=50)).float().detach()
sequence_2[0] = 1

labels_1 = np.zeros(50)
labels_1[::2] = 1
labels_1 = torch.tensor(labels_1, dtype=torch.long)
labels_2 = labels_1.clone()

training_data = [sequence_1, sequence_2]
label_data = [labels_1, labels_2]

Create simple LSTM Model

class LSTM(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(LSTM, self).__init__()
        self.lstm = nn.LSTM(input_dim, hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, seq):
        lstm_out, _ = self.lstm(seq.view(len(seq), 1, -1))
        out = self.fc(lstm_out.view(len(seq), -1))
        out = F.log_softmax(out, dim=1)
        return out

We try to overfit on the dataset

INPUT_DIM = 1
HIDDEN_DIM = 6
model = LSTM(INPUT_DIM, HIDDEN_DIM, 2)

loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

for epoch in range(500):  
    for i, seq in enumerate(training_data): 
        labels = label_data[i]
        model.zero_grad()
        scores = model(seq)
        loss = loss_function(scores, labels)
        loss.backward()
        print(loss)
        
        optimizer.step()
        

with torch.no_grad():
    seq_d = training_data[0]
    tag_scores = model(seq_d)
    for score in tag_scores: 
        print(np.argmax(score))

1 Like