Neural Network predict all input negative?

I’m working on a sentiment analysis project with keras in python using word2vec as an embedding method. (in my NON_ENGLISH corpus I have 3 classes) and my corpus is completely balanced and I set 8000 tweets for training and 1000 for testing.

but my model returns almost all of the inputs sentences negative! how can I solve this problem??

1.8900/8900 [==============================] - 15s 2ms/step - loss: 0.5896 - acc: 0.6330 - val_loss: 0.0000e+00 - val_acc: 1.0000

As you see, the validation accuracy (val_acc) is 1.0000 ! It’s clearly impossible to have .63 training accuracy and 1 for validation - What’s the problem and How can I solve it?

my model returns almost all of the inputs sentences negative

I’m not entirely sure what that means, the embeddings are negative valued? A classification on top of the embedding never puts things in the positive class? Is your data very imbalanced maybe?

If you think the problem might be your word2vec, in the past I’ve had success using gensim to generate the embeddings. Even though it’s CPU only, it is still sometimes faster than keras in the GPU.

I think I didn’t explain enough. At first I use word2vec as embedding method and after that I divide my corpus in 2 parts as testing and training in this way:

for i, index in enumerate(indexes):
    for t, token in enumerate(tokenized_corpus[index]):
        if t >= max_tweet_length:
            break
        
        if token not in X_vecs:
            continue
    
        if i < train_size:
            X_train[i, t, :] = X_vecs[token]
        else:
            X_test[i - train_size, t, :] = X_vecs[token]
            
    if i < train_size:
        Y_train[i, :] = [1.0, 0.0] if labels[index] == 0 else [0.0, 1.0]
    else:
        Y_test[i - train_size, :] = [1.0, 0.0] if labels[index] == 0 else [0.0, 1.0]

and using keras model and in prediction step when I enter a new sentence to predict :
y_pred=model.predict(X_test)

I just receive negative prediction even for positive one!

By the way-I’m sure about my corpus- It’s balanced

What does your model look like? Just a fully connected network on top of the embeddings?

You say that the network just predicts 0 for everything, is the probabilites it returns 1,0? If not are they the same for every prediction? If they are 1, 0, it means that definitely predicting something different for the training, since that would make loss infinite (assuming you are using cross entropy loss).

here is my whole cross model:

batch_size = 32
nb_epochs = 100

model = Sequential()

model.add(Conv1D(32, kernel_size=3, activation='elu', padding='same', input_shape=(max_tweet_length, vector_size)))
model.add(Conv1D(32, kernel_size=3, activation='elu', padding='same'))
model.add(Conv1D(32, kernel_size=3, activation='elu', padding='same'))
model.add(Conv1D(32, kernel_size=3, activation='elu', padding='same'))
model.add(Dropout(0.25))

model.add(Conv1D(32, kernel_size=2, activation='elu', padding='same'))
model.add(Conv1D(32, kernel_size=2, activation='elu', padding='same'))
model.add(Conv1D(32, kernel_size=2, activation='elu', padding='same'))
model.add(Conv1D(32, kernel_size=2, activation='elu', padding='same'))
model.add(Dropout(0.25))

model.add(Flatten())

model.add(Dense(256, activation='tanh'))
model.add(Dense(256, activation='tanh'))
model.add(Dropout(0.5))

model.add(Dense(2, activation='softmax'))

# Compile the model
model.compile(loss='categorical_crossentropy',
              optimizer=Adam(lr=0.0001, decay=1e-6),
              metrics=['accuracy'])

# Fit the model
model.fit(X_train, Y_train,
          batch_size=batch_size,
          shuffle=True,
          epochs=nb_epochs,
          validation_data=(X_test, Y_test),
          callbacks=[EarlyStopping(min_delta=0.00025, patience=2)])

it just returns everything with exact number = 0 as negative labels - Do you have any suggestion?

Do model.predict on a few samples, what are the probabilities it predicts? Should be two floats that add to 1, per sample.

Your input shape–don’t you need to indicate your batch size or at least that there will be an extra dimension with None?

Does Conv1D make sense here? Each sample is 2D-- embedding x # words.

I did model.predict in the fewer sample and I still got only negative as prediction. and about this sentence: Should be two floats that add to 1, per sample.
I’m not sure if I understand you’re meaning properly. Would you please explain more clearly?

and about using Conv1D, you mean that I should use Conv2D instead of it?
How can I indicate extra dimensions with none?

sorry for my simple questions I’m really new to this field

Would you please explain more clearly?

The basic thing that comes out of this model for a given sample will two numbers: the probability that the result is negative, and the probability that the result is positive. Since they’re probabilities, we expect them to sum to 1. If those two numbers are the same every time, it’s possible somethings wrong with your data loading mechanism.

I should use Conv2D instead of it

I don’t know of anyone using Conv2D specifically but it makes sense to me at least. Conv1D makes less sense to me, I’m not sure what it could possibly be doing.

sorry for my simple questions I’m really new to this field

No worries, but if you’re looking for a solution that’s less fiddly, I would recommend the approach in the fastai videos, I think it’s halfway through lesson 4. You’re likely to get much better results with much less effort with that.

1 Like

The underlying model there is an LSTM which does a better job modeling long term dependencies, and I think it’s found in most of the state of the art results nowadays.

1 Like

Thanks a million for your time and explanation.
I consider 0 as negative and 1 as positive and the model always consider every sentence negative and return 0 as a label.
I gonna try my model with 2D and will report you the result.
and thanks for your recommendation. I will watch those videos immediately. Hope to help me

1 Like

I have already thought about LSTM and maybe you got a point. I should think twice for choosing the model

I consider 0 as negative and 1 as positive

If you look at the last layer of your model you see it’s returning two numbers. Those two numbers will be the probabilities the model sees of the 0 case and the 1 case. Since there are only two predictions here, Prob_0 = 1 - Prob_1, so its a bit redundant (but if you had more classes it would be necessary). If you take the argmax of those probabilities you get the prediction. So the model might return, eg .91, .09 and then to make the prediction you say, “the zeroth index has the highest probability, predict that.”

If you want to jump to the part about sentiment detection it’s here (but if you have time I’d recommend watching the whole series):

1 Like

Now I completely understand what you mean. sorry but what if i have 3 classes? cause I want to solve this problem and after that for enhancing my project use neutral too. Is it ok to assign 0 for negative and 1 for positive and -1 for neutral? in this way the probabilitie’s sumation won’t be 1