Skip to Content
Uncategorized

AI Defeats the Hivemind

How a machine learning algorithm beat the assembled masses of Mechanical Turk.
December 20, 2010

Amazon’s Mechancial Turk is the ultimate in nearly anonymous outsourcing: any task that can be completed online can be accomplished by the combination of automated marketplace and human labor. Those who sign up to complete tasks - Turkers - are paid wages as low as pennies per chore to do everything from data entry to folk art.

(cc) Jim Linwood

Mechanical Turk is designed to complete tasks that are easy for humans and hard for machines, such as categorizing or identifying the content of images. The problem for Amazon and all its imitators, however, is that machines are getting better at many tasks, while the humans on Mechanical Turk, for reasons I’ll explore in tomorrow’s post, are getting worse.

Recently, for example, researchers working at the online review site Yelp released a paper (pdf) on their experience matching thousands of Mechanical Turkers against a supervised learning algorithm.

The results weren’t pretty: in order to find a population of Turkers whose work was passable, the researchers first used Mechanical Turk to administer a test to 4,660 applicants. It was a multiple choice test to determine whether or not a Turker could identify the correct category for a business (Restaurant, Shopping, etc.) and verify, via its official website or by phone, its correct phone number and address.

79 passed. This was an extremely basic multiple choice test. It makes one wonder how the other 4,581 were smart enough to operate a web browser in the first place.

These 79 “high quality” workers were then thrown at the problem of verifying business information three at a time. This allowed the researchers to take only the results that a simple majority of Turkers agreed were correct, or in some cases to take the result chosen by the Turker who had historically been the most accurate.

Researchers threw a “Naive Bayes classifier” at the same set of problems. This is a kind of supervised learning algorithm; one that, according to a 2006 comparison of these systems, isn’t even the best kind out there.

The Bayes classifier won handily.

In almost every case, the algorithm, which was trained on a pool of 12 million user-submitted Yelp reviews, correctly identified the category of a business a third more often than the humans. In the automotive category, the computer was twice as likely as the assembled masses to correctly identify a business.

These results don’t necessarily suggest that business categorization is a problem like chess, where the human computer has finally been exceeded by its mechanical counterpart. Rather, they suggest that something about Mechanical Turk itself is broken – either the incentive system or its mechanisms for policing quality. It’s long been known that the wages on Mechanical Turk are quite low - workers are making, on average, between two and three dollars an hour for their labors, and it’s likely that this is part of the problem. Economists have only just begun to address the question; more on that tomorrow.

Follow Mims on Twitter or contact him via email.

Deep Dive

Uncategorized

Five poems about the mind

DREAM VENDING MACHINE I feed it coins and watch the spring coil back,the clunk of a vacuum-packed, foil-wrappeddream dropping into the tray. It dispenses all kinds of dreams—bad dreams, good dreams,short nightmares to stave off worse ones, recurring dreams with a teacake marshmallow center.Hardboiled caramel dreams to tuck in your cheek,a bag of orange dreams…

Work reinvented: Tech will drive the office evolution

As organizations navigate a new world of hybrid work, tech innovation will be crucial for employee connection and collaboration.

lucid dreaming concept
lucid dreaming concept

I taught myself to lucid dream. You can too.

We still don’t know much about the experience of being aware that you’re dreaming—but a few researchers think it could help us find out more about how the brain works.

panpsychism concept
panpsychism concept

Is everything in the world a little bit conscious?

The idea that consciousness is widespread is attractive to many for intellectual and, perhaps, also emotional
reasons. But can it be tested? Surprisingly, perhaps it can.

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.