Mistake in Stateful LSTM Lesson 6?

hamelsmu · February 13, 2017, 6:08pm

From my understanding of stateful RNN where batch size > 1, the keras documentation states

all batches have the same number of samples

If X1 and X2 are successive batches of samples, then X2[i] is the follow-up sequence to X1[i], for every i.

This essentially means that the data must be “interleaved” in batches as discussed here:

and also here: https://github.com/fchollet/keras/issues/1820

My understanding the training data has to be re-constructed to look like this:

Sequence: a b c d e f g h i j k l m n o p q r s t u v w x y z 1 2 3 4 5 6
BATCH 0
sequence 0 of batch:
a
b
c
d
sequence 1 of batch:
q
r
s
t
BATCH 1
sequence 0 of batch:
e
f
g
h
sequence 1 of batch:
u
v
w
x

However, it doesn’t look like @jeremy is doing this in Lesson 6 in his notebook for the stateful LSTM (and the batch size = 64). Did Jeremy make a mistake, or do I have a mis-understanding?

jeremy · February 19, 2017, 6:49pm

Interesting - the reddit thread does suggest that I made a mistake, although the code does seem to work nonetheless! Would be great if you or someone could try rearranging the inputs as suggested and see if it learns better/faster - if anyone tries this, let us know how you go! I’ll add a note to the video if it turns out to be in error.

raspstephan · October 6, 2017, 2:05pm

Hi,
I had the same question about stateful models as @hamelsmu when going through lesson 6 and building my own char RNN model. So I decided to try out the reordering of the input data. Here is my process.

First the reordering of the data:

Here is the model:

Here are the results for two epochs with the standard and the restructured data:

So you can see that the same model learns much more quickly with the restructured data.