Artificial Intelligence, Powered by Many Humans

Crowdsourcing can create an artificial chat partner that’s smarter than Siri-style personal assistants.

Tom Simonitearchive page

September 10, 2012

Personal assistants such as Apple’s Siri may be useful, but they are still far from matching the smarts and conversational skills of a real person. Researchers at the University of Rochester have demonstrated a new, potentially better approach that creates a smart artificial chat partner from fleeting contributions from many crowdsourced workers.

Crowdsourcing typically involves posting simple tasks to a website such as Amazon Mechanical Turk, where Web users complete them for a reward of a few cents. The tasks are often simple, repetitive jobs that are easy for humans but tough for computers, such as categorizing images. Crowdsourcing has become a popular way for companies to handle such tasks, but some researchers, including the group at Rochester, believe it can also be used to take on more complex tasks.

When people talk to the new crowd-powered chat system, called Chorus, using an instant messaging window, they get an experience practically indistinguishable from chatting with a single real person. Yet behind the scenes, each response is the result of tens of people paid a few cents to perform small tasks: including suggesting possible replies and voting for the best suggestions submitted by other workers.

Tests where Chorus was asked for travel advice showed that it could be smarter than any one individual in the crowd, because around seven people were contributing to its responses at any one time. Helpers built this way might also be cheaper than paying a conventional one-on-one assistant. “It shows how a crowd-powered system that is relatively simple can do something that AI has struggled to do for decades,” says Jeffrey Bigham, an assistant professor at the University of Rochester, and a member of the research team that created Chorus. Bigham jokes that Chorus is more likely to pass a Turing Test, which challenges an artificial intelligence system to fool someone into thinking it’s human, than conventional chat software, although it may not meet most definitions of artificial intelligence.

In trials of the system, people asked Chorus for advice on restaurants to visit in Los Angeles and New York, and quickly received suggestions. Feedback such as “Hmm. That seems pricey,” was quickly taken on board by the crowd, which came up with an alternative. AI systems such as Siri typically have difficulty following this kind of back-and-forth conversation, particularly in colloquial language.

Bigham worked with Rochester colleagues Walter Lasecki and Rachel Wesley, and Anand Kulkarni, the cofounder of crowdsourcing company MobileWorks (see “Human Workers, Managed by an Algorithm”). Their goal was to find a new way to increase the power of crowdsourcing, which is typically limited to simple, isolated tasks, such as adding labels to image files. “What we’re really interested in is when a crowd as a collective can do better than even a high-quality individual,” says Bigham, by combining work on many simple tasks into a coherent, complex whole.

Chorus does that with three simple types of task. First, any new chat updates from the human user are passed along to many crowd workers, who are asked to suggest a reply. Those suggestions are then voted on by crowd workers to determine the one that will be sent back.

A final mechanism creates a kind of working memory that ensures that Chorus’s replies reflect the history of the conversation so far, crucial if it is to carry out long conversations—something that is a challenge for apps like Siri and even AI chatbots intended to showcase conversational skills.

For the working memory component, crowd members are asked to maintain a short running list of the eight most important snippets of information under discussion, to be used as a reference when workers suggest replies. This is important, as to allow for the natural turnover of crowdsourcing workers. “A single person may not be around for the duration of the conversation—they come and go, and some may contribute more than others,” says Bigham.

Bigham says Chorus has the potential to be more than just a neat demonstration. “We definitely want to start embedding it into real systems,” he says. “Perhaps you could help someone with cognitive impairment by having a crowd as a personal assistant.”

Another possibility is to combine Chorus with another system previously developed at Rochester, which has crowd workers collaborate to steer a robot. “Could you create a robot this way that can drive around and interact intelligently with humans?” asks Bigham.

Michael Bernstein, an assistant professor at Stanford University who is currently doing research at Facebook, agrees that Chorus could lead to real-world applications (see “Adding Human Intelligence to Software”).

“You could go from today where I call AT&T and speak with an individual, to a future where many people with different skills work together to act as a single incredibly intelligent tech support,” says Bernstein. He says the Chorus software could become a true expert if it were able to direct incoming questions to members of the crowd with particular knowledge or skills.

However, Bernstein adds that it may be necessary to add more reviewing steps to Chorus in order to filter a crowd’s suggestions to prevent it developing a split personality when faced with difficult questions. This is a familiar problem in applying crowdsourcing. One of the Rochester team’s biggest challenges when building their crowd-controlled robot, for example, was to prevent it crashing into obstacles dead ahead because half the crowd workers steering it wanted it to go left, and the other half wanted it to go right.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.