Mistake in Stateful LSTM Lesson 6?

From my understanding of stateful RNN where batch size > 1, the keras documentation states

all batches have the same number of samples

If X1 and X2 are successive batches of samples, then X2[i] is the follow-up sequence to X1[i], for every i.

This essentially means that the data must be “interleaved” in batches as discussed here:

https://www.reddit.com/r/MachineLearning/comments/4k3i2n/keras_stateful_lstm_what_am_i_missing/

and also here: Does the stateful_lstm.py example make sense? · Issue #1820 · keras-team/keras · GitHub

My understanding the training data has to be re-constructed to look like this:

Sequence: a b c d e f g h i j k l m n o p q r s t u v w x y z 1 2 3 4 5 6
BATCH 0
sequence 0 of batch:
a
b
c
d
sequence 1 of batch:
q
r
s
t
BATCH 1
sequence 0 of batch:
e
f
g
h
sequence 1 of batch:
u
v
w
x

However, it doesn’t look like @jeremy is doing this in Lesson 6 in his notebook for the stateful LSTM (and the batch size = 64). Did Jeremy make a mistake, or do I have a mis-understanding?

1 Like

Interesting - the reddit thread does suggest that I made a mistake, although the code does seem to work nonetheless! Would be great if you or someone could try rearranging the inputs as suggested and see if it learns better/faster - if anyone tries this, let us know how you go! I’ll add a note to the video if it turns out to be in error.

Hi,
I had the same question about stateful models as @hamelsmu when going through lesson 6 and building my own char RNN model. So I decided to try out the reordering of the input data. Here is my process.

First the reordering of the data:

Here is the model:

Here are the results for two epochs with the standard and the restructured data:

So you can see that the same model learns much more quickly with the restructured data.