Why are my losses NaN?

Hi all,

I’m using Glove word2vec embeddings in my model and from a copy an paste error I had that layer mistakenly using regularization.

word_in  = Input(shape=(1,), dtype='int64', name='word_in')
word_emb = Embedding(n_words, n_factors, input_length=1, W_regularizer=l2(1e-4)
                 weights=[glove_vecs], trainable=False
                )(word_in)

What happened when I was training is that all my losses were reported as NaN. I realize using a regularizer with untrainable preloaded weights is not the right thing to do, and removing it made it train just fine. However, I was wondering if theres some conceptual reasoning that explains why the loss value exploded.

Here’s the full model if it helps

user_in  = Input(shape=(1,), dtype='int64', name='user_in')
user_emb = Embedding(n_users, n_factors, input_length=1, W_regularizer=l2(1e-4))(user_in)

word_in  = Input(shape=(1,), dtype='int64', name='word_in')
word_emb = Embedding(n_words, n_factors, input_length=1, 
                     weights=[glove_vecs], trainable=False
                    )(word_in)

x = merge([user_emb, word_emb], mode='concat')
x = Flatten()(x)
x = Dropout(0.3)(x)
x = Dense(70, activation='relu')(x)
x = Dropout(0.75)(x)
x = Dense(2, activation='sigmoid')(x)
nn = Model([user_in, word_in], x)
nn.compile(Adam(0.001), loss='categorical_crossentropy')

There is one thing that doesn’t look quite right: the final activation is not compatible with that loss function. Categorical cross-entropy expects a ‘softmax’ activation in the final layer, not ‘sigmoid’. Consider changing that to see what happens.