Lesson 6 discussion

(Ravi Teja Gutta) #62

does it have
def __iter__(self): return self


Hi @rteja1113 , Yes. Here is the code… thanks for your time…

  class MixIterator(object):
             def __init__(self, iters):
                 self.iters = iters
                 self.multi = type(iters) is list
                 if self.multi:
                    self.N = sum([it.n for it in self.iters])
                    self.N = sum([it.n for it in self.iters])

             def reset(self):
                 for it in self.iters: it.reset()

             def __iter__(self):
                 return self
             def next(self, *args, **kwargs):
                 if self.multi:
                    nexts = [next(o) for o in self.iters]
                    n0 = np.concatenate([n[0] for n in nexts])
                    n1 = np.concatenate([n[1] for n in nexts])
                    return (n0, n1)
                    nexts = [next(it) for it in self.iters]
                    n0 = np.concatenate([n[0] for n in nexts])
                    n1 = np.concatenate([n[1] for n in nexts])
                    return (n0, n1)

(Matt Kraemer) #64

I just started working through lesson 6 notebook after following the setup for the second semester (Python 3, Keras 2, TF 1.2.0 backend)

Most of the notebook is working for me after a few changes (package restructuring, parameter naming, python 2 xrange vs python 3 range functions etc…)

The only one that I’m having trouble running is the Keras GRU section.

The error is:

oh_x_rnn.shape, oh_y_rnn.shape
((75109, 8, 85), (75109, 8, 85))

        GRU(n_hidden, return_sequences=True, input_shape=(cs, vocab_size),
                  activation='relu', recurrent_initializer='identity'),
        TimeDistributed(Dense(vocab_size, activation='softmax')),

#ValueError: Identity matrix initializer can only be used for 2D square matrices.    

model.compile(loss='categorical_crossentropy', optimizer=Adam())

model.fit(oh_x_rnn, oh_y_rnn, batch_size=64, epochs=8)

I can’t seem to get the GRU to accept equal values for the input_shape or the values of the actual inputs from the original notebook. Anybody else run into any issues with input_shape param on the GRU in Keras 2

(Matt Kraemer) #65

Found the answer at:

Roebius converted the notebooks; remove the recurrent_initializer

(Mercedes) #66

Why doesn’t an LSTM layer in a Sequential model require an activation function when Stateful=True?

(John Lundberg) #67

I am having the same issue. The theano RNN converges to only output spaces, ’ '. I tried removing spaces from the input data just to see if it would be able to converge better then, but then the RNN only predicts ‘e’.

It seems to be the same issue that others have had with the other RNNs in this thread, that it predicts the most common character always.

But I haven’t been able to fix it. Maybe larger batches or better sgd variants would help but I’m not good enough at using theano to implement those yet.

(adrian) #69

For the section ## Our first RNN with keras! I am having the same problem as @Christina

print('len(xs): {0}, xs[0].shape: {1}'.format(len(xs), xs[0].shape))

: len(xs): 8, xs[0].shape: (75112,)

if I use


I get

IndexError: One of the index value is out of bound. Error code: 65535.\n

If I use:

x_rnn=np.concatenate(xs, axis=1)

I get

numpy.core._internal.AxisError: axis 1 is out of bounds for array of dimension 1

If I use:

x_rnn = np.reshape(xs, (-1, len(xs)))
model.fit(x_rnn, y, batch_size=64, nb_epoch=8, verbose=2)

I get:

IndexError: One of the index value is out of bound. Error code: 65535.\n

Any ideas?


Hi adrian,

Interestingly enough when you create the xs list each member of the list has shape (75112, ). However, after you fit the model in the Our first RNN! section, Keras changes the shape of the arrays in the xs to (751112, 1). I have no idea why this happens. A quick google search didn’t yield anything either. If you run that section first, then the Our first RNN with keras! should work. If you don’t want to do that, you can do the following:

model.fit(np.concatenate([x[np.newaxis] for x in xs]).T, y, batch_size=64, nb_epoch=8)

I hope that helps!



Does anyone know why the number of weights in the Our first RNN with keras! section of this notebook is 76,544 for the weight matrix feeding the hidden state?

I would’ve expected it to be 76,800 which is computed as follows:

input weights: 42 latent factors x 256 hidden units + 256 biases = 11,008 weights
recurrent weights: 256 hidden units x 256 hidden units + 256 biases = 65,792 weights

total weights: 11,008 + 65,792 = 76,800 weights

I notice that 76,800 - 76,544 = 256, so perhaps I’m double counting the bias weights? But when we built our own RNN in the section above called Our first RNN! there 76,800 weights for that portion of the model.


(adrian) #72

Thanks Patrick,

Both your suggestions worked.

I think the source of the conversion of a 1D to 2D array is in keras’ training.standarize_input_data() function - the section marked #make arrays at least 2D section.

(adrian) #73

For the statefull model with keras section

# Stateful model with keras 

I note there is no model.predict after model.fit like the other sections. So I tried adding it, however I am having difficulty working out how to re-dimension the numpy array before model.predict.

eg if I use

def get_next_stateful_keras(model, inp, batch_size):
    idxs = [char_indices[c] for c in inp]
    arr = np.array(idxs)[np.newaxis,:]
    p = model.predict(arr, batch_size=batch_size)
    return [chars[np.argmax(o)] for o in p]

I get the following error:

    ValueError: dimension mismatch in args to gemm (64,256)x(256,256)->(1,256)

I think that the input needs to be 64xn?, but I can’t work out how to get the data in this shape.

eg the following won’t work:

b = np.reshape(arr, (64, 1))
p = model.predict(b)

(Tao) #74

Hi, ( thanks for the entire course, it’s so good :slight_smile: )

So, I have a question about embedding. How is our learning evolve with the number of embeddings?

Because, I can understand that with too few of them, we can’t learn anything, and, if you put way too much, you’re sure too overfit.

But is it better to put as many as possible or as less as possible?

Thank you :slight_smile:

(Li) #75

@tmu Thanks, it indeed help to fix the issue

(Robert William Whelan) #76

Maybe you already saw this, but check out Jeremy’s helper functions in this file: https://github.com/fastai/courses/blob/master/deeplearning1/nbs/char-rnn.ipynb

I’m working on it too and will post something when I figure it out.

(Md. Muhaimin Rahman) #77

Hello Jeremy. At First , I want to thank you for your course! I am stuck in Stateful RNN. Please can you clarify my following questions ?

  1. What is stateful RNN ? I am not clear about it . AFAIK I know about LSTM , it is best RNN because , it know which data from previous predcitions should be saved and are relevant and which are not . Is it all about being stateful ? if yes, why do we need to write stateful=True again ?

  2. In the stateful RNN section, why did we use Timedistributed layer ? what is the purpose of this layer?

  3. I haven’t seen any get_next function for stateful RNN . do you have any ? I have made my own function but the results are horrible!

(Darshan Bagul) #78

Answering your question about Stateful RNNs-

Imagine you have extremely long sequences, like a time series of 50000 timesteps. Now LSTMs do have a good memory theoretically, but cannot track dependencies beyond 500 timesteps practically. But you are dealing with a sequence of 50000, and want the model to learn dependencies across the entire sequence, which not even LSTMs/GRUs can.
Hence you try to break down the sequence of 50000 timesteps, into 100 subsequences of 500 timesteps while propagating the state learned after every 500 timesteps to the next sequence; for all 100 subsequences generated above. Thus after an iteration over these 100 subsequences, the hidden state will approximately learn representation of the original sequence of 50000 timesteps.
This is how stateful RNNs work.

The flag stateful=True in Keras helps us achieve that, by not resetting the learned state after each subsequence. Hence, we have an added task of manually resetting the learnt state after every sequence while implementing in Keras.
Hope this clarifies your doubts. You can read more about stateful LSTMs here: http://philipperemy.github.io/keras-stateful-lstm/



Hi everyone,

Just curious, Jeremy mentioned in lesson 6 that a RNN will be useful for doing sentiment analysis.

Could anyone point out how this model can be implemented for sentiment analysis? Thank you.

Is this done by training the model using the entire text as input and a set of training labels such as 1 or 0?

(Yousof Ebneddin) #80


in “Returning sequences” section of the lesson 6 notebook Why the zeros vector has this shape

In [68]:
zeros = np.tile(np.zeros(n_fac), (len(xs[0]),1))
(75110, 42)

This is equal to (number of samples X n_fac). why?

And a question about Embedding.
Each layer of Embedding is different from the others. because in the summary Keras counts 3570 (85*42) parameters for each embedding layer. That means each character in the sequence gains different n_fac = 42 parameters in different layers. Am I understanding correct?

Does the output of the Embedding have the size of n_fac = 42 ?


(James Townend) #81

I also had the problem with only getting a " " predicted. I resolved it by increasing the learning rate to 0.1 and going through a dozen or so epochs, bringing my loss down to 2.93. So maybe the solver was stuck in a local minimum? If that makes sense… I’m also using a Tensorflow backend, which may have something to do with it - I’ve always found it difficult to replicate Jeremy’s losses using identical learning rates and number of epochs.

(Ingmar Schubert) #82


Because the input2hidden - dense layer we defined expects vectors of the size (number_of_samples,n_fac). We designed that this way because the input2hidden - dense layer later (i.e. after the initialization) gets its input from the embeddings, which create ouput of the size (number_of_samples,n_fac). For the initialization, we want to use the same layer, therefore we have to make sure its input has the right size.


Yes, this is correct. How it is implemented, the rnn uses a new Embedding for each input, whereas the input_to_hidden - dense layer is the same for every input, as well as the following hidden_to_hidden, of course, is the same for every step. I am not completely sure what could be the architectural idea behind this, but I guess it reflects the different meaning of a character depending on the position on which it occurs in an 8 char word