IMDb Sentiment Analysis Applied to Book Reviews

William.Collins · November 29, 2018, 9:11pm

I tried running my IMDb sentiment model (.94 accuracy) on some book reviews, and the results look pretty horrible. I’m not sure if it’s due to the different vocabulary/language usage, or because I’m evaluating the outputs incorrectly.

Some short review examples with outputs:

Review: nathan and natalie take a ferry trip to try to capture the creature , with mayhem the predictable outcome . meanwhile , cross - country skiers set off despite the absence of snow ; more ludicrously still , elmo ( father ' s name elmo ) was born in milan , while martha ( jewish father ismar ) hails from silesia ; other inconsistencies abound . some promising characters blundering around in a madhouse : a showcase of what goes wrong when writers , editors , and publishers sleepwalk through production .
Probability: 1.0000
Sentiment: pos

Review: joyce harrington , judith kelman , warren murphy , and justin scott complete the party . an all - star lineup that , sadly , offers no more excitement than the average all - star game .
Probability: 1.0000
Sentiment: pos

Review: quality writing from a practiced hand .
Probability: 1.0000
Sentiment: neg

Review: mood indigo is a truly delightful tale , complete with an unpretentious mystery and a refreshingly warm tribute to friendship thrown in for good measure .
Probability: 1.0000
Sentiment: neg

My code for getting these results:

def predict_reviews(reviews, model, stoi):
    for rev in reviews:
        model.eval()
        model.reset()

        text_idx = []
        for tok in rev.split(' '):
            if not tok in stoi:
                text_idx.append(stoi['_unk_'])
            else:
                text_idx.append(stoi[tok])
        text_idx = np.array(text_idx, dtype=np.int32)

        text_idx = V(T([text_idx])) 

        outputs = model(text_idx)[0]
        pred = outputs[0] #why take the first? isn't this the output at the first token?
        probs = F.softmax(pred)
        val, idx = probs.max(0)
        sentiment = 'neg' if idx.data[0]==0 else 'pos'
        prob = val.data[0]
        print('\n\nReview: %s\nProbability: %.4f\nSentiment: %s' % (rev, prob, sentiment))

predict_reviews(reviews, learn.model, stoi)

Thanks!
-Bill