Hi Chris. I feel your frustration trying to find any decent tutorials and code for basic time series. I have spent the past two weeks searching, adapting, tracing, and debugging low quality code and my own toy examples. At last the landscape is becoming clearer. I think your toy problem is completely amenable to a simple RNN.
First, the worldās shortest overview of RNNs. Itās for my benefit to explain verbally and for yours perhaps to understand better. Letās forget about batches for now and consider just one time series, like yours with three time points. The RNN processes these one at a time in sequential order. At each step it outputs a āhidden stateā, usually a vector. The hidden state and the next sequence element are fed into the same RNN again until the end of the series is reached. There are a bunch of gates and learnable weights inside the RNN which calculate the next hidden state from the previous one and the new input. But thatās all the raw RNN itself does: (hs(n),s(n)) -> hs(n+1)
The hidden state carries the memory of what has come earlier in the series. Otherwise, the RNN would not know anything about what came before. Except for the hidden state it has complete amnesia for inputs that have come earlier, just like an ordinary one-shot model. But this hidden state, though it carries much information about the series, is useless to us, because we do not know the meaning of any of its elements. So we add another layer that takes in the hidden state and outputs the quantities we are interested in knowing or predicting. This is usually a Linear (fully connected) layer. It is applied to the hidden state at each time step to get the output or prediction at that step. The last piece is to apply a loss function to Linearās output vs. the target at each time step. Then backpropagation and gradient update allow the weights inside the RNN and Linear to be trained.
A few glosses⦠for a time series, the output is typically a time prediction of the next input, but the output could be any quantity or quantities you want to predict (careful not to confuse āpredict a classā with āpredict the futureā). You do not have to compute the output and loss at every time step. And you can decide how often to do backpropagation/gradient update. The whole picture gets more complex with language models: encoding and decoding, padding, bptt, partial sorting, etc.
In PyTorch, nn.LSTMCell implements the RNN. It does exactly (hs(n),s(n)) -> hs(n+1). It would appear in the modelās forward() in a loop that processes the time series in order, passing hs to the next iteration. You decide how to process the hidden state into a prediction and loss. PyTorch also provides nn.LSTM, which processes an entire time series at once. Although it must operate sequentially internally, itās a gazillion times faster than LSTMCell in a Python loop. Its output is therefore a series of hidden states, one for each time step. You can then apply Linear to them to get the predictions at each time step.
As for your particular toy problem, if you look at the docs, nn.LSTM lets you set the number of input features per time step, so you can certainly use your three. Number of outputs is determined by the Linear layer, so you can make it three or whatever you choose. You might also want to consider using the whole time series at once for input, rather than just groups of three.
You will also need to read up on PyTorch DataSet and DataLoader to create individual elements of your training and validation sets. Remember, each single element of the DataSet is a time series. Once you have DataLoaders you can make a DataBunch, a Learner, and use fastaiās conveniences.
I do not know whether fastai can handle time series data directly. I asked on the forum, received no reply, and so went directly to PyTorch. As for batches, I have not figured them out! My DataSets return a single time series, so bs=1, one batch per epoch, and GPU saturation even so. Maybe you will figure out how to use batches and explain them to me.
I have pasted some code fragments for the model and training loop to help you get started. And if anyone finds bugs in my code or explanations, please tell me!
class LSTMSimpleMdl(nn.Module):
def __init__(self,ni,nh,nl,aInput):
# ni - number of input features
# nh - number of hidden features
# nl - number of stacked LSTM layers
# aInput - True: append input before sending to linear (an experiment)
super().__init__()
self.NLAYERS = nl
self.NHSIZE = nh
self.NINPUT = ni
self.aInput = aInput
self.lstm1 = nn.LSTM(self.NINPUT, self.NHSIZE, self.NLAYERS, batch_first=True)
self.linear = nn.Linear(self.NHSIZE + (ni if self.aInput else 0), 1)
def forward(self, input):
ninput= input
h_t = torch.zeros(self.NLAYERS, input.shape[0], self.NHSIZE, dtype=torch.float).cuda() # hidden state for each batch element
c_t = torch.zeros(self.NLAYERS, input.shape[0], self.NHSIZE, dtype=torch.float).cuda()
output, (h_t, c_t) = self.lstm1(ninput, (h_t, c_t))
if self.aInput:
output = torch.cat((output, input), dim=2) #Append the original inputs, skipping around the RNN
output = self.linear(output)
return output.flatten(1)
model = LSTMSimpleMdl(1, 100, 1, False).cuda()
def lossFlat(p,t):
return loss_fn(p.flatten(), t.flatten())
loss_fn = nn.MSELoss()
def trainN(N,lr):
global test_pred,vmtarget,output,mtarget,mbatch,vmbatch
optimizer = optim.Adam(model.parameters(), lr=lr)
for i in range(N):
for mbatch,mtarget in training_generator:
optimizer.zero_grad()
output = model(mbatch)
mtarget = mtarget[:,-output.shape[1]:] #shorten target to match shortened output
loss = lossFlat(output, mtarget)
if i%1==0:
loss.backward()
optimizer.step()
with torch.no_grad():
for vmbatch,vmtarget in validation_generator:
test_pred = model(vmbatch)
vmtarget = vmtarget[:,-test_pred.shape[1]:] #shorten target to match shortened output
vloss = lossFlat(test_pred, vmtarget)
print('%i %2.9f %2.9f' % (i, loss.item(), vloss.item()))
trainN(30,.01)
data = DataBunch(training_generator,validation_generator)
learn = Learner(data, model, loss_func=lossFlat)
learn.fit_one_cycle(10, max_lr=.01, wd=0)