RNN for multiple short time series?


Hi all! I have the following data problem. My data consists in N time series of length L (N is in the thousands, L is about 50). The data for each time series comes in labeled form (x(t),y(t)) for t = 1,…,L, where:

  • x(t) is a 20-dimensional numeric vector (continuous)
  • y(t) is a scalar value (continuous)

What I would like the model to do is estimate y(t+1) from past information, which could be x(t), x(t-1), …, and y(t), y(t-1), … , as well as possibly some state. Each time series should follow highly similar dynamics, so I would like to train a single model for all the time series.

My current idea is to use an RNN architecture for this, but I am not sure where to start. It seems to me that unlike in most of the examples we’ve seen, an embedding layer is not needed, so I can probably skip this. But for instance, how should I format the data? Trying to wrap my head around the input dimensions, output dimensions and hidden state dimensions.

Also, I am not sure how to split the data into train/test/validation.

Any thoughts would be very welcome!