Skip to Content
Artificial intelligence

Alibaba already has a voice assistant way better than Google’s

It navigates interruptions and other tricky features of human conversation to field millions of requests a day.
December 4, 2018
Andrew Burton/Getty; phone by: Aulia_icons, ID/Noun Project

In May, Google made quite the splash when it unveiled Duplex, its eerily humanlike voice assistant capable of making restaurant reservations and salon appointments. It seemed to mark a new milestone in speech generation and natural-language understanding, and it pulled back the curtain on what the future of human-AI interaction might look like.

But while Google slowly rolls out the feature in a limited public launch, Alibaba’s own voice assistant has already been clocking overtime. On December 2 at the 2018 Neural Information Processing Systems conference, one of the largest annual gatherings for AI research, Alibaba demoed the AI customer service agent for its logistics company Cainiao. Jin Rong, the dean of Alibaba’s Machine Intelligence and Technology Lab, said the agent is already servicing millions of customer requests a day.

The pre-recorded demo call involved the agent asking a customer where he wanted his package delivered. In the back-and-forth exchange, the agent successfully navigated several conversational elements that demonstrated the breadth of its natural-language capabilities.

Take this exchange at the beginning of the call, translated from Mandarin:

Agent: Hello, I am Cainiao’s voice assistant. I am—
Customer: Hello.
A: Yes, hi, you have package scheduled for morning delivery to 588 Culture West Road. Is it convenient for you to receive?
C: Who are you?
A: I am Cainiao’s voice assistant. I’d like to confirm your morning delivery to 588 Culture West Road. Does that work for you?
C: I’m not home in the morning.
A: Then do you have another address that you’d like to use?

Within 30 seconds, the agent has smoothly handled three common, and tricky, conversational ingredients: interruption, nonlinear conversation, and implicit intent. Interruption is self-explanatory: the agent can respond to the customer’s interruption and continue relaying relevant information without starting over or skipping a beat.

The nonlinear conversation occurs when the customer asks, “Who are you?” This requires the agent to register that the customer is not answering the preceding question but rather starting a new line of inquiry. In response, the agent reintroduces itself before returning to the original question.

The implicit intent occurs when the customer responds, “I’m not home in the morning.” He never explicitly says what he actually means—that home delivery won’t work—but the agent is able to read between the lines and follow up sensibly.

These elements may be boringly commonplace in human conversations, but machines often struggle to handle them. That Alibaba’s voice assistant can do so suggests it’s more sophisticated than Google Duplex, judging from similar sample calls demoed by Google. It’s worth noting, however, that Alibaba’s demo call is designed for onstage presentation; the experience could differ in reality.

Currently, the agent is used only to coordinate package deliveries, but Jin said it could be expanded to handle other topics. He wouldn’t fully reveal how the assistant was trained. But he alluded to using the massive number of customer recordings at the company’s disposal, in addition to other resources. On a typical day the company averages 50,000 customer service calls, according to the presentation slides—a number that quintuples for Singles’ Day (November 11), its highest revenue-generating holiday of the year.

Alibaba is also developing digital assistants for other aspects of its business, including a food-ordering agent that can take your order in noisy restaurants and stores; a humanlike virtual avatar that can field questions about Alibaba products; and a price-haggling chatbot that is already used by 20% of sellers on Alibaba’s resale platform Xianyu.

At their core, each of these assistants is powered by the speech-recognition and natural-language-processing engine called AliMe, developed by the company’s Machine Intelligence and Technology Lab. They are then packaged and adapted to different parts of the business.

Alibaba’s biggest advantage in this field is the overwhelming wealth of data it has to train its AI. The assistants learn and improve faster because of the amount of practice they get in handling all kinds of situations. A huge business incentive to deploy these technologies quickly also helps. In addition to handling a high volume of customer support calls, Alibaba delivers one billion packages per day. Offloading certain tasks to AI helps alleviate the burden on humans and keep the business running smoothly.


Deep Dive

Artificial intelligence

How to opt out of Meta’s AI training

Your posts are a gold mine, especially as companies start to run out of AI training data.

Apple is promising personalized AI in a private cloud. Here’s how that will work.

Apple’s first big salvo in the AI wars makes a bet that people will care about data privacy when automating tasks.

This AI-powered “black box” could make surgery safer

A new smart monitoring system could help doctors avoid mistakes—but it’s also alarming some surgeons and leading to sabotage.

An AI startup made a hyperrealistic deepfake of me that’s so good it’s scary

Synthesia's new technology is impressive but raises big questions about a world where we increasingly can’t tell what’s real.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.