Simple stateful neural net, can fit model but not predict (Lesson 6)

I’m trying to create a simple stateful neural network in keras to wrap my head around how to connect Embedding layers and LSTM’s. I have a piece of text where I have mapped every character to a integer and would like to send in one character at a time to predict the next character. I have done this earlier where I have sent in 8 characters at a time and got that to work well (using return_sequences=True and TimeDistributed(Dense)). But this time I want to only send in 1 character at a time and this is where my problem arises.

The code I use to set up my training data:

text = ... #Omitted for simplicity. Just setting text to some kind of literature work

text = text.lower() #Simple model, therefor only using lower case characters

idx2char = list(set(list(text)))
char2idx = {char:idx for idx,char in enumerate(idx2char)}

text_in_idx = [char2idx[char] for char in text]

x = text_idx[:-1]
y = text_idx[1:]

The code I use to set up my model:

n_fac = 32
vocab_size = len(idx2char)
n_hidden = 256
batch_size=64

model = Sequential()
model.add(Embedding(vocab_size,n_fac,input_length=1,batch_input_shape=(batch_size,1)))
model.add(BatchNormalization())
model.add(LSTM(n_hidden,stateful=True))
model.add(Dense(vocab_size,activation='softmax'))

model.summary() gives me the following:

Layer (type)                     Output Shape          Param #     Connected to                     

embedding_1 (Embedding)          (64, 1, 32)           992         embedding_input_1[0][0]          

batchnormalization_1 (BatchNorma (64, 1, 32)           128         embedding_1[0][0]                

lstm_1 (LSTM)                    (64, 256)             295936      batchnormalization_1[0][0]       

dense_1 (Dense)                  (64, 31)              7967        lstm_1[0][0]                     

Total params: 305,023
Trainable params: 304,959
Non-trainable params: 64

Compiling and training my model:

model.compile(optimizer=Adam(lr=1e-4),loss='sparse_categorical_crossentropy')

nb_epoch = 10
for i in range(nb_epoch):
    model.reset_states()
    model.fit(x,y,nb_epoch=1,batch_size=batch_size,shuffle=False)

Training works as it should, the loss is reduced with each epoch.

Now I want to try out my trained network but have no idea how to give it a character to predict the next.

I start out by resetting its states and then want to start feeding it one char at a time.

I tried a couple of different inputs but all of them failed. These are not qualified guesses:

#The model uses integers for characters, therefor integers are sent as input

model.predict([1]) #Type error

model.predict(np.array([1])) #Value error

model.predict(np.array([1])[np.newaxis,:]) #Value error

model.predict(np.array([1])[:,np.newaxis]) #Value error

Am I forced to send in something of length batch_size or how am I supposed to send in data for the model to predict something?

First part of one of the Value error text:

ValueError: dimension mismatch in args to gemm (64,256)x(256,256)->(32,256)

Using theano backend with keras.

My question is also posted on stackoverflow.

I might not be able to answer this post until tomorrow unless I find more free time tonight.

Thanks in advance!

I figured out how to get predictions to work. I need to set batch_size=batch_size in

model.predict(np.array(X),batch_size=batch_size)

So my question is now the following;
If I have a neural network with an embedding at the start that takes input_length=1 and batch_size=(batch_size,1) can I not use it to predict on input of length 1? Does the input to predict also have to be of batch_size? Is the only way to predict on a smaller length of data to reduce batch_size and therefor slow down the training? input_length=1 doesnt matter?

For now im just gonna pad my single value input with batch_size - 1 zeros and only grab the first prediction in the array returned. Seems a bit hacky, hopefully a better solution can be found.

1 Like

Hey @tetratrio were you able to come up with any alternatives? I’m struggling with this right now and can’t seem to find a clear answer anywhere. Your solution seems to work (write a utility function that pads the input with batch_size - 1 zero-vectors of the same input length and take the first prediction) but agree that seems hacky and there must be a better way. Would love to hear about any progress you’ve made.

Hey!

I felt I got stuck and spent too much time on this so I moved on with the course. Never did solve it I’m afraid.

I have since moved over to pytorch as I feel that it is easier to understand and doesn’t require hacky methods like keras to get stuff to work. It’s a bit more work to get into but once you know the basics its so much easier to use in my opinion.