Adding Human Intelligence to Software

TurKit lets programmers combines code with input from an army of online human workers.

John Pavlusarchive page

October 18, 2010

Amazon’s Mechanical Turk service has long provided a cheap source of labor, when the job is simple for humans but difficult for computers. Tasks such as describing a picture, for example, can be completed online by remote, human workers. Programmers already use groups of these workers, called turkers, to do many such tasks at the same time. But Mechanical Turk offers no easy way for programmers developing new software applications to combine and coordinate the turkers’ efforts. Now computer scientists at MIT have developed a toolkit that does just that. Called TurKit, the tool lets software engineers write algorithms to coordinate online workers using the Javascript programming language, and create powerful applications that have human intelligence built in. The software can also be debugged like normal code.

**Software with brains:** The word processing add-on, Soylent, shown above, was built with TurKit. Turkit helps developers write algorithms that integrate the work of humans recruited through Mechanical Turk.

“Usually in Javascript, you wouldn’t be able to access Mechanical Turk without a lot of work,” explains Greg Little, a PhD candidate at MIT’s Computer Science and Artificial Intelligence Laboratory, who created TurKit. “This is a bridge for writing code that interacts with the workers on Mechanical Turk, so we can easily explore new methods of human computation.”

With TurKit, human input is stored in a database. That way, anytime the software under development crashes, the turkers don’t have to start over from scratch. Instead, once the program has been fixed, it can pick right up where it left off. “If you wait an hour for the humans to finish their task, and then the program throws an error, you don’t want to wait another hour just to see if your bug fix works,” says Little. TurKit also prevents the human input from changing unpredictably during the debugging process. “If I got different behavior every time I ran (a program), I could never debug that moving target,” says Michael Bernstein, a PhD candidate at MIT, who used TurKit to create a word-processing application called Soylent.

Thanks to TurKit, researchers have already created human computation algorithms stable enough to incorporate into functioning software. Soylent uses groups of three to seven turkers to do on-demand proofreading and paragraph shortening in Microsoft Word, with an algorithm called Find-Fix-Verify. In the Find stage, turkers simply highlight errors without correcting them. Soylent compares the results from several workers for consistency, then sends the filtered output to another group of turkers who correct the errors. Finally, a third group checks the corrections for quality; substandard results are flagged and Soylent displays only the vetted corrections. “If you just set turkers loose on your paragraph, around 30 percent of the work you get back is unusable,” says Bernstein. “We wanted to treat that as inherent noise in the system while guaranteeing quality to the end user.”

Another Mechanical Turk application, called VizWiz, is being developed to allow blind users to identify objects, such as street signs or pantry items, with the help of their smart phone cameras and sighted turkers. Ideally, VizWiz will work fast, so that users get results when they need them most. University of Rochester computer scientist Jeffrey Bigham and his team used TurKit to create an algorithm, called quikTurkit, that reduces lag time by queueing up groups of turkers before they are needed. When a user activates VizWiz’s camera, quikTurkit signals turkers that a new query is imminent–either recruiting new workers on demand or sending the request to a pool of eight turkers already engaged in answering previous queries. The former method returns results to the user within a couple of minutes; the latter averages less than 30 seconds. “If you’re running an expensive optical character recognition app on your phone, it might take that long to give you an answer anyway,” says Bigham, “whereas VizWiz is smarter and could be cheaper.”

Both Bigham and Bernstein say they see human computation as a rich field for future applications–with open-source tools like TurKit as the best means of prototyping and refining them. “Human algorithms are fundamentally different from the ones we’re used to, and TurKit lets us explore ways of optimizing them,” says Bernstein. “If we start wiring human crowds successfully into these systems, we can produce an end product which is way more powerful, and do so at low cost with high reliability.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.