Hi again. I appreciate your initiative and Conwyn’s help implementing a language model. Still, I need to point out that yours is not a language problem but rather a time series problem. The way you are approaching it, with a sliding window, will make predictions based only on the previous twelve values. This works fine for training a basic LM model. There, the input naturally divides into short sentences. (It may work ok for your time series problem if there are no long-range dependencies.) But that approach does not work well in general for a time series problem, where the prediction should be based on the entire series up to that point.
An RNN applied to a times series takes the sequence value and the hidden state at a time point, and outputs an updated hidden state. The hidden state goes into a “head” that derives a prediction. The prediction and target are compared by a loss function that trains the RNN. PyTorch provides functions that seemingly process the entire sequence in one step, though it must necessarily operate sequentially in the GPU. So the RNN, typically LSTM or GRU, trains from the whole sequence, not in batches of twelve numbers.
My suggestion is that you find a working tutorial on LSTM, study how it’s done, and experiment with changing the sequence to your own. There are many more advanced methods for predicting time series (it’s a rich area), and most of them work better than RNN. But an RNN is a fine place to start learning.
Please take a look at:
HTH,