#AmazonFail – a classic Information Quality impact

So, Amazon recently delisted thousands (over 57000 to be precise) of books from their search and sales rankings. The Wall Street Journal carries the story, as does the Irish Independent (here), The China Post (here), ComputerWorld (here), the BBC (here)…. and there are many more. In the Twitter-sphere and BlogSphere, this issue was tagged as #AmazonFail. (some blog posts on this can be found here and… oh heck, here’s a link to a google search with over 400,000 results). There are over 13000 seperate Twitter posts about it, including a few highlighting alternatives to Amazon.

We have previously covered Amazon IQTrainwrecks here and here. Both involved inappropriate pushing of Adult material to customers and searchers on the Amazon site. Perhaps they are just over-compensating now?

It appears that these books were mis-categorised as “Adult” material, which Amazon excludes from searches and sales rankings. Because the books were, predominantly but not exlusively, relating to homosexual lifestyles, this provoked a storm of comment that Amazon was censoring homosexual material. However books about health and reproduction were also affected.

Amazon describe the #AmazonFail incident as a “ham fisted” cataloguing error which they attribute to one employee entering data in one field in one system. One commentator ascribes the blame to an algorithm in Amazon’s ranking and cataloguing tools. And a hacker has claimed that it was he who did it, exploiting a vulnerability in Amazon’s website.

No matter what way you slice it or where you ascribe the root cause, this is a classic information quality problem. The case for the quality of the information should not be argued from inside Amazon out but rather from the information consumer’s perspective in.

People looking for books on human reproduction or health should be able to find those books on Amazon when they search for them. It is patently inaccurate to categorise those books as “Adult” content, so whatever process acted on the data to affect the “Category” attribute on the records created poor quality information. Among the books apparently tagged as “Adult” was Stephen Fry’s autobiograpy “Moab is my Washpot“, a very funny book which happens to be written by a gay man.

A few explanations/theories have emerged to explain this IQTrainwreck.

The Lone Frenchman Defence

On the SeattlePI.com blog, the root cause was attributed to a single employee in France.

Amazon managers found that an employee who happened to work in France had filled out a field incorrectly and more than 50,000 items got flipped over to be flagged as “adult,” the source said. (Technically, the flag for adult content was flipped from ‘false’ to ‘true.’)

This is also discussed on ComputerWorld where they say:

A former Amazon employee named Mike Daisey said in an interview that the problem really did appear to have been caused by an employee mistake.

According to Daisey, a friend within the company told him that someone working on Amazon’s French Web site mistagged a number of keyword categories, including the “Gay and Lesbian” one, as pornographic, using what’s known internally as the Browse Nodes tool. Soon the mistake affected Amazon sites worldwide, Daisey said. “If you use that tool in one site, it affects every site in Amazon,” he noted. “So the guy screwed up in France and it propagated everywhere.”

The Floating Algorithm Argument

Another possible root cause was put forward by Mary Hodder on TechCrunch. She argued that this wasn’t a glitch in the data but rather:

#AmazonFail is about the subconscious assumptions of people built into algorithms and classification that contain discriminatory ideas.

True. This happens. Assumptions are the mother of all IQTrainwrecks (well, a lot of them). However, the master data retagging explanation rings more true in this writer’s experience.

A Hacker on the Grassy Knoll

Apparently a right-wing hacker has also claimed credit (?) for this trainwreck. This has been dismissed and debunked by bloggers.

So many explanations, so little time.

If the “lone Frenchman” explanation is true, then it is clear that there are deficiencies in how Amazon manages and controls its master-data. It is entirely possible that this explanation is true. If a single master category of book was retagged as “Adult” it would likely cascade down to all “child” records. God bless relational databases.
If the “floating algorithm” explanation is correct, then Amazon have coded an algorithm that corrupts the accuracy of their data. Bad code. Naughty code. Go to your room. We won’t get sucked into the ‘thought-police’ debate about algorithms designed to censor or restrict certain content. Suffice it to say, if an algorithm ran that tagged children’s books or medical textbooks as “Adult” content, then the algorithm was producing duff quality data, regardless of the intent (in fact, if an algorithm to censor/restrict certain content does exist but was ‘secret’ then the outcome of this boo boo has been to raise awareness of it).
If the “hacker on the grassy knoll” claims are to be believed, Amazon was failing to secure its “Information Asset”, which as a cloudy business is the main asset that they have. Deficiencies in the information architecture may have allowed someone outside the business to basically do what the lone Frenchman is claimed to have done… affecting wholesale over 57000 titles.

Any way you slice this it is ham-fisted. Particularly if you also factor in that this appears to be something that has been happening for some time.

This counts as an IQTrainwreck because Amazon’s reputation has taken a battering. Also, the impact on authors’ sales cannot be ignored. As the BBC’s Bill Thompson says:

Some careful analysis by “Jane” on the Dear Author site indicates that the problem lies in the metadata, the additional details about each book added by Amazon and publishers. Books classed as “gay”, “lesbian”, “transgender”, “erotic” or “sex” have been filtered while the Playboy book, whose content is classed as “nude”, not “sex”, remains.

He goes on…

When a book is misfiled in my local Borders it may result in a few lost sales, but for the whole of Amazon to ‘misplace’ my book may mean nobody in the world buys it. If the filtering had affected people writing about how to keep sheep as domestic pets instead of gay fiction we might not have noticed the error.

So, it’s either the metadata, a “lone Frenchman”, an algorithm run awry, or a hacker. Either way the data driving Amazon’s content filtering was corrputed, resulting in poor quality information (incomplete, inaccurate or just plain missing) being presented to people searching for books. Resulting in lost sales and lost revenue.

A response has emerged on Amazon where people are using the tag “AmazonFail” to highlight books (and other content) that have been classed as “Adult” in error. Currently there are 1529 items tagged on Amazon.com.

IQTrainwrecks.info

A Website Dedicated to Information/Data Quality Disasters from Around the World

#AmazonFail – a classic Information Quality impact

One thought on “#AmazonFail – a classic Information Quality impact”

Leave a Reply Cancel reply