Book Ch 1 - IMDB-based review classifier not detecting obviously negative reviews

michalparkola · July 31, 2023, 9:34pm

Hi,

I was experimenting with the Chapter 1 Notebook on Colab and I’m getting some strange behaviour from the movie review classifier in the “Deep Learning Is Not Just for Image Classification” section.

It categorizes many obviously negative reviews as positive:

I’m confused. Is it known to do that?

I understand that the accuracy is not 100% but these seem like low hanging fruit to get right.

Am I missing something?

MartinSch · December 26, 2023, 9:32am

Hello Michał,
I also observed that discrepancy. I tried to figure out whether an alternative model, or different settings, could improve the rating. Do you, in the meanwhile, have a better understanding of the case?
Best regards,
Martin

@michalparkola

michalparkola · December 28, 2023, 1:59pm

Sadly I got no response and didn’t investigate too deeply myself. Please let me know if you discover anything

MartinSch · December 28, 2023, 8:21pm

Hello Michał,
thank you for your reply! I tried just one experiment: I let ChatCPT generate paraphrases of sentences like “I don’t like the movie” or “The movie was boring”. I only generated around 200 of such phrases (the output was limited to 2048 characters per post, which made it a bit tedious). I added these phrases to the first negative reviews (so I could only modify a small fraction of the 12500 ‘neg’ reviews). Fine-tuning the model with these data, would at least improve a little bit the categorization of these minimal reviews,
learn.predict(“I have seen it to the end. I would recommend a different movie.”)
(‘pos’, tensor(1), tensor([0.0173, 0.9827]))

learn.predict(“I don’t recommend this movie.”)
(‘neg’, tensor(0), tensor([0.5594, 0.4406]))

learn.predict(“This movie is boring.”)
(‘neg’, tensor(0), tensor([0.9981, 0.0019]))

learn.predict(“Way too long to be interesting.”)
(‘neg’, tensor(0), tensor([0.6452, 0.3548]))

So maybe the model just had not enough chances to learn the harsh and short messages. Still I wonder whether a different model would yield ‘better’ results, probably I’ll come across different text models while studying the book and course.

Best regards, and have a Happy New Year!
Martin