ML unfairness here and now

stas · January 27, 2020, 7:34pm

A few days ago I was searching for some books for my gf and I stumbled upon someone’s a book [4b] on Amazon that had a confusing rating of 3/5, with only 9 5-star reviews and no other reviews and 1 single rating of 2-stars w/ no review. I started investigating finding that discrepancy to say the least strange, and contacted Amazon customer service. I received a reply that this is ML doing [3]. I started doing more digging as it made absolutely no sense that a company would pull absolutely fake negative rating out of the thin air. I tried to ask for help with this investigation at reddit [1], but it didn’t go too well, as instead of trying to investigate the abnormal behavior, most commentors tried to discredit my finding. I got a lot of useful input, learned a few things about Amazon’s inconsistent behavior which depends on whether you’re logged in or not. And then last night I thought of checking my own first book [4a] I published with O’Reilly some 18 years ago. Lo and behold, my book was abused by Amazon ML in exactly the same way.

I hope at least some of you here will try to follow the math and see that it doesn’t make sense.

How do you take 3 5-star reviews, 4 4-star reviews, and 1 rating of 1-star w/ no review and turn it into a rating of 3/5? The real review of the book is ~4.5, yesterday it was pushed down to 3.1, and this morning it got pushed down to 2.9 - w/o any changes to the ratings. I know that this single 1-star rating is fake, because it was given a hugely disproportional weight. And the same was done in the case of the original book I was investigating, where a single 2-star fake rating was added with a disproportional weight pulling the book’s total 5-star review to 3 stars! If you are not a published author, you may not be aware that a rating of 3 for a book on amazon is a death sentence to that book.

It so baffles me that hundreds of people read that expose on reddit [1] and everybody thought it was OK, and some thought that I was crying wolf and worked hard at trying to prove that. What is the point of us discussing fairness in AI, if when an unfair behavior goes in production we all complacently just seat back and watch.

One commentor correctly identified that I had a false expectation of Amazon to give a fair treatment to all of its products it represents. It’s just a notice board where products are placed and of course, it doesn’t care which products are being sold and which are buried since it profits either way. As long as there are several products to choose from in each category and there is a need to buy, who cares about the fairness.

It’s also clear that Amazon is attempting to battle the myriads of fake reviews, and while I think it’s fantastic, it clearly doesn’t care that its ML model destroys on the way vendors and authors who didn’t lie, but can’t prove their book/product was really purchased elsewhere and the review is real. None of the reviews on my book are fake, i.e. I have not solicited any of those reviews. And those people bought my book at the O’Reilly conferences usually via Powell books and not Amazon, and yet those reviews are now clearly considered fake. (So if you ever sell something on Amazon, make sure your reviews are only of Verified Purchase “quality”.)

The main point I’m trying to drive across is that if you develop an ML engine and you have unverified ratings, how can you possibly think of an algorithm or a model of “balancing things out by adding fake negative ratings”? If you don’t trust the reviews/ratings, then remove them altogether. Say, this product has no verified reviews. Can you see how pulling a random number from a thin air and assigning it as rating to a product you know nothing about is just a terrible practice that makes no logical sense whatsoever? Unfortunately it appears that only a minority of products is affected, so there is no class action suit in works here.

If you do care to help me investigate the unfair representation of products on Amazon, here is the data I have so far:

The reddit discussion, unfortunately it didn’t lead to any constructive discussion, and 40% of redditors who read it so far (23 hours) downvoted the post.
My original article
Amazon’s customer service reply from which I learned it’s ML’s doing and not an algorithm:

The overall star rating for a product is determined by a machine-learned model that considers factors such as the age of the review, helpful votes by customers, and whether the reviews are from verified purchasers. Similar machine-learned factors help determine a review’s ranking in the list of reviews. The system continues to learn which reviews are most helpful to customers and improves the experience over time. Any changes that customers may currently experience in the review ranking or star ratings is expected as we continue to fine-tune our algorithms.
Several books that I discovered that were affected by this ML representation (notice you are likely to see different ratings depending on whether you’re logged in or not):

4a. My book. Here we have (3*5+4*3 + 1*1)/8 somehow adding up to 2.9 - note that that 1-star rating is not real and was added artificially with 48% weight against all the real reviews.

Here is snapshot of my book yesterday:

practical-modperl-amazon-hallucination848×879 160 KB

here is a snapshot of my book this morning (dropped from 3.1 to 2.9 - notice the weight change)

practical-modperl-amazon-hallucination-2020-01-27841×658 52.7 KB

4b. Another book. Here we have (9*5 + 1*2)/10 somehow adding up to 3.0. Again, here we have a single fake 2-star rating with a weight of 68%!. Are you noticing the pattern?

Here is a snapshot that other book: (I don’t know the author and haven’t read the book.)

amazon-review-5796×923 243 KB

Clearly, I need to find more examples, since otherwise nobody cares to pay attention. I am yet to find a quick way to do it. It seems to be an edge case, that affects only some older books with a few reviews and most/all without the Verified Purchase tag. It probably affects some non-book products too.

Edit: found a bunch of similar examples:

Lean Python - adds a fake 2-star review pulling a 1 not-verified 5-star review to 3.2, with weight 60% - only does that if logged in, shows 5-stars if not logged in
Python Machine Learning - 5/5 => 3/5. this one is odd as it may be includes a non-amazon.com review? but it didn’t do it on other books, so possilbly the same story. if you log out gives 5/5
Python Programming - logged in 3/5 with one fake 1-star review, logged out 5/5 - clearly internation reviews didn’t count here, since there is another 5-star review there.
Python 3: Pocket Primer - same story 3/5 logged in, 5/5 logged out

Your careful analysis is very welcome. I hope at least someone out there cares.

p.s. My book is 18 years old and it is a 900-page tome on a once very popular mod_perl Apache module, but that technology has been mostly forgotten nowadays, so I’m not expecting any sales from it, but it doesn’t change anything. Who gave Amazon the right to misrepresent my or anybody else’ book’s quality, ML or not?

Update:

It took a few days to solve the puzzle, I have done a massive update to the original article including all the discovered information and making it easier to read.

The bottom line is that Amazon is:

Integrating international reviews into the ratings if you’re logged in
Older reviews are given a significantly lower weight (can be like 1/10 or even less)
Purchased product reviews get higher weights

#2 of this new approach, can have a tremendous negative impact on authors of outdated tech books. If you don’t want to get the rating of your older book to get continually lowered, take it off Amazon.

I’m in the process of figuring out on how to remove my books from Amazon.

You will find the full summary and recommendations for tech book authors on Amazon and Amazon marketplace designers in the article.

I don’t have any sources confirming my findings, this is just derived from looking at many many listings and trying to make the math work.

Thank you.

florobax · January 28, 2020, 10:42am

Must be a bit like the netflix ratings on movies. They do not reflect the quality of the film but the likelihood of you liking it. It is however very strange to do that by tweeking the ratings, which are a totally different mean to rank the items. If they want to do a recommendation algorithm, they should do one that does not interfere with the ratings. So I 100% agree with you this is quite problematic.

ctwardy · January 28, 2020, 4:16pm

Hi, I noticed you make at least two claims:

The negative reviews are fake.
The single negative reviews get disproportionate weight.

I don’t see the evidence for the first - the single 2-star review of Family Constellations provides a rationale, and is from 2014. It could be fake - but

But the second seems true. In theory I don’t mind Amazon assigning weights to reviews based on a model, but I had assumed the percents on the stars reflected # of reviews. You’ve shown they do not. This should be more transparent.

It’s also true that the ML model, even if say 90% accurate, will cause harm where wrong. It’s helpful that there is (presumably) an overall improvement, but insufficient. Most ethical systems hinge on the things they won’t sacrifice in order to achieve greater average goodness.

In this case, Amazon should monitor for who is negatively affected. They might, but it’s not as critical to them as to those affected, so good to raise the issue. If you can get a little more information, it might be worth approaching a writers’ guild and/or science/tech journalists.

stas · January 28, 2020, 5:11pm

Thank you for investigating this, @ctwardy

That’s correct.

I claim that because when I log out, that 10th rating disappears:

And the same story happens with other examples I have now found. Except my book that has no international reviews, so it must have been someone leaving a 1-star rating w/o leaving a review.

Looking some more and comparing things, I can now see that when I log in it also adds what it calls Top International Reviews, whose ratings then do much the mysterious negative scores. It seems to be the explanation for the first claim then. That is the negative reviews are not fake, as I claimed but are sourced from Top International Reviews.

This is great, thank you, I will go and adjust my article to reflect the new discovery.

So the only remaining problematic thing is that some reviews/ratings get disproportionate weight.

For example, my book among other reviews has 1 5-star Verified Purchase, and 1 1-star rating w/ review (presumably Verified Purchase too):

screenshot_13

Say, we ignore all other 4.5 and 5 reviews as them being of unknown fidelity. The total rating should be at least 3, and not 2.9, i.e. that 1 5 star review should be at least of the same weight as the one star review. But it can be clearly seen from the percentages in the chart that this is not so. And, of course the 6 other 4- and 5- star reviews should push the total above 3.

and when if I log out, the weights of the identical otherwise rating gets manipulated again:

screenshot_14

The weights are suddenly different. So, presumably their ML gives different weights to different ratings depending on who is looking.

It is also possible that they use other signals, for example, an older review/rating may not get the same weight as a newer rating. Given that my book is from 2003 and the bulk of reviews is from that era, perhaps, the system decides to discount the weight of the reviews as each year passes. So a single review from say today, which would be 17 years later, could be given a higher weight than the total of 7 reviews from 17 years ago, and thus pulling the total rating powerfully towards that single recent rating.

I can see how this would apply for tech books, because, their relevancy disappears over time. And while researching this I did see a bunch of 1-star reviews on various tech books from the past, with 1-star “this book is outdated” reviews.

But, surely, you won’t apply the same decay to non-technical books. I haven’t explored that sector of the books to see whether it’s impacted by this manipulations too. Clearly, this shouldn’t be so for psychology-related books, as those take much longer to lose relevancy, than tech books (but it seems to hit that other book in the same way).

stas · January 28, 2020, 5:31pm

And the other thing that’s missing for me is integrating the ML results with user-explainable interface/presentation. The whole exploration started with a single puzzlement experience when I was browsing books, run into this book and saw a rating of 3/5 and 9 5-star reviews, and I was having hard time making sense of whether this is a good book or a book to avoid. The rating was signalling to me run-away, but all the reviews were consistently fantastic. So, even if the ML were to do a stellar job at ranking the book at 3, I, as a user, was completely confused about this outcome.

stas · January 28, 2020, 5:57pm

So, basically, in this updated Amazon universe superb tech books are destined to become mediocre in some years and terrible in some more years, since there is always someone buying an outdated book by mistake and leaving it a 1-star review, and the ML giving it a disproportionate weight to overpower any previous reviews.

I worked hard for 3.5 years to write this book, and it was an excellent outcome. Yet, if one were to look at it now, at it’s current 2.9/5 rating - it’s a shitty 900-page worth pile of words as far as Amazon universe goes. This makes no sense and it is totally unfair and mis-representing the reality from the quality of the book point of view as for the time it was written. Since the book is no longer relevant to the modern tech world, I’m just going to ask O’Reilly to pull it off the market and I’m going to ask Amazon to remove my books and give them no right to sell any of my books in the future (I have 4 written so far). My parents have a hard copy, and they can still show it to their visitors, it doesn’t have to be on Amazon.

Also, since this machination of discounting aged reviews can be easily done algorithmically and requires no ML whatsoever, I won’t waste any more of your time, since now I think I have this puzzle solved, and this subject matter is no longer relevant to this ML forum. I appreciate your time reading and participating so far. Thank you.

jeremy · January 28, 2020, 7:35pm

I agree - it was a really really good book.