Is it normal that a model trained for sentiment analysis, gives different accuracy on test set?

Master · June 23, 2019, 6:28am

Hello all.
I recently ran a sentiment analysis project (code is here ) everything goes fine without a hitch! except on testing I noticed, each time I run the snippet below, I get different accuracies. The difference is not huge, usually around -+.02~0.5%. for example on one run I get 77.50%, on the second run I may get 77.2%, on the third I may get 77.34% or I may get 76.68% etc you get the idea.
Is it not supposed to give me the very same accuracy at all times? Is this stuff different in text processing in deeplearning?

# Get test data loss and accuracy

test_losses = [] # track loss
num_correct = 0

# init hidden state
h = net.init_hidden(batch_size)

net.eval()
# iterate over test data
for inputs, labels in test_loader:

    # Creating new variables for the hidden state, otherwise
    # we'd backprop through the entire training history
    h = tuple([each.data for each in h])

    if(train_on_gpu):
        inputs, labels = inputs.cuda(), labels.cuda()
    
    # get predicted outputs
    output, h = net(inputs, h)
    
    # calculate loss
    test_loss = criterion(output.squeeze(), labels.float())
    test_losses.append(test_loss.item())
    
    # convert output probabilities to predicted class (0 or 1)
    pred = torch.round(output.squeeze())  # rounds to the nearest integer
    
    # compare predictions to true label
    correct_tensor = pred.eq(labels.float().view_as(pred))
    correct = np.squeeze(correct_tensor.numpy()) if not train_on_gpu else np.squeeze(correct_tensor.cpu().numpy())
    num_correct += np.sum(correct)


# -- stats! -- ##
# avg test loss
print("Test loss: {:.3f}".format(np.mean(test_losses)))

# accuracy over all test data
test_acc = num_correct/len(test_loader.dataset)
print("Test accuracy: {:.3f}".format(test_acc))

Thanks a lot in advance, I really appreciate your kind help.