LSTMs in Keras without explicit timesteps

telarson · October 23, 2017, 9:12pm

I’ve been playing around with LSTMs for sequence classification and keep seeing example LSTMs with no explicit timesteps provided.

From this example:

github.com

keras-team/keras/blob/master/examples/imdb_lstm.py

'''Trains an LSTM model on the IMDB sentiment classification task.

The dataset is actually too small for LSTM to be of any advantage
compared to simpler, much faster methods such as TF-IDF + LogReg.

# Notes

- RNNs are tricky. Choice of batch size is important,
choice of loss and optimizer is critical, etc.
Some configurations won't converge.

- LSTM loss decrease patterns during training can be quite different
from what you see with CNNs/MLPs/etc.
'''
from __future__ import print_function

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM

This file has been truncated. show original

the following simple model is created

model = Sequential()
model.add(Embedding(max_features, 128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

I’m curious how this LSTM is fed it’s input. Does it receive one vector of length 128 each timestep or a matrix with dimensions maxlen x 128?

When a batch_input_shape parameter gets passed in it’s very clear the number of timesteps used. When no, timestep info is provided I assume one vector of length 128 is passed to the LSTM as follows:

model.add(Embedding(max_features, 128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2, batch_input_shape=(batch_size,1, 128)))
model.add(Dense(1, activation='sigmoid'))

However, I also trained a model as follows and got the same loss value after a few epochs.

model.add(Embedding(max_features, 128))
#note: batch_input_shape change below
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2, batch_input_shape=(batch_size, maxlen, 128)))
model.add(Dense(1, activation='sigmoid'))

Maybe keras is ignoring batch_input_shape keyword argument? Interestingly, when I get a model.summary() for all of the models mentioned above they appear to have identical layers with an identical output shape and number of parameters.

Any help would be much appreciated!