Chatbots with Social Skills Will Convince You to Buy Something

The descendants of Alexa and Siri might come with a surprisingly good sales pitch.

I met an early version of such a persuasive chatbot at a tech conference in Pittsburgh recently. After some small talk and jokes, the bot, called Sara, recommended some other people for me to meet. The suggestions were in fact excellent, and if I hadn’t just met with them already, I would’ve followed her lead.

Sara was developed by Justine Cassell, director of human-computer interaction at Carnegie Mellon University, who is studying ways for virtual agents to use subtle cues in conversation to build rapport with people and become more effective at conveying information or persuading them to do something.

Sara, a chatbot, uses several microphones and video cameras to track a person’s nonverbal communications.

The work hints at the potential for more useful chatbots. Most bots remain horribly clumsy and easily confused, and it will take time to achieve deeper language understanding. But conversational cues could help make these tools less annoying and more effective. Some big companies are already looking toward the approach as a way to make their virtual helpers better.

Speaking with Sara certainly felt less jarring than talking to a regular chatbot. The system studies the words a person says during a conversation as well as the tone of his or her voice, also using several cameras to study the speaker’s facial expressions and head movements. These cues are fed into a program that determines an appropriate response designed to build a feeling of rapport with a person. At one point during our conversation, for instance, Sara saw me smile and nod, and instantly made a self-deprecating comment.

“There are a lot of chatbots out there that use social talk, but they use it randomly, and that’s not useful,” Cassell says. “In real humans, social talk is there to serve a purpose.”

For instance, Cassell says, people often use small talk in the build-up to an awkward question, which helps soften the blow. If a computer uses small talk in the wrong context, it will not only be less effective but downright jarring. She and her students have studied many different kinds of human interactions to understand the components that might be captured and encoded into machines, most recently annotating the behavior of high school students teaching each other algebra.

Although Cassell has been studying ways for machines to mimic conversational rapport for more than a decade, she says corporate researchers have only recently shown an interest in her work. In fact, one big company, which she declined to name, recently offered to buy the patents behind Sara, she says.

The techniques Cassell and others are honing could prove especially important as virtual helpers take on more responsibility for guiding the way we find information and shaping our purchasing habits. Facebook, Microsoft, Google, and others see chatbots as a promising new interface for reaching customers (see “Here Come the Marketing Chatbots”).

Sara was on display at a conference, sponsored by the World Economic Forum, held in Tianjin, China, recently.

Some of the more polished virtual helpers out there, like Siri, already make use of some subtle social cues (see “Social Intelligence”). Earlier this year, Apple bought a company called Emotient, which is developing technology for tracking people’s emotions. That may signal an intention to improve Siri’s emotional intelligence.

Meanwhile, sources familiar with Amazon’s research say the company is investigating ways of making Alexa, the virtual assistant in its Echo voice-controlled device, more attuned to users’ emotional states as expressed through their tone and manner of speech (see “Amazon Working on Making Alexa Recognize Your Emotions”). Obviously, Amazon’s device is also designed with one eye on enabling online purchases, so it isn’t hard to imagine Alexa developing an artful sales patter.

There are challenges to making it work in any practical way. Timothy Bickmore, a professor at Northeastern University, says one of the biggest is capturing all the different cues that may be relevant to an interaction, including facial expressions and body language. This may be especially tricky on mobile devices, although in theory a smartphone could capture the information used to enable something like Sara. And he says the approach isn’t much help if an interface is just meant to execute a command as quickly and efficiently as possible. “Those cues are most useful in a natural conversation,” he says. “Sometimes the social stuff just gets in the way.”

Sara, at least, seemed pretty useful as a conference helper. Earlier this year, Cassell and her students took Sara to an event hosted by the World Economic Forum in Tianjin, China, where it helped attendees meet people. And she says an as-yet-unpublished study shows that Sara is more effective at getting someone to click on a link when the agent applies its conversational strategies. That kind of human-to-machine relationship-building may be a sign of things to come.

“My goal is to build a device that can spend a lifetime with you, and over time its behavior will change,” Cassell says. “If, over five years, it is speaking to you in the same way that it did on the day you bought it, then you’re going to feel like you have a device with amnesia—or even worse, one that doesn’t care about you.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.