The Chinese Internet giant Baidu launched a conversational personal assistant service called Duer at a company event held in Beijing Tuesday. It is just the latest sign that we could soon forgo swiping and typing for chatting with our computers instead.
The assistant service is designed to provide quick and easy access to Baidu’s various Internet services and to engage in a dialogue with users rather than simply being voice-controlled. Duer (which means “Du secretary”) is bundled with the latest versions of Baidu’s apps for smartphones.
Duer’s success will depend on how well it can parse naturally spoken language. This is notoriously difficult, although researchers have been making significant progress in recent years in both speech recognition and, to a lesser degree, natural language processing thanks to a powerful machine-learning technique known as deep learning. Companies such as Facebook see natural language as a key challenge for mining information and communicating with users (see “Teaching Machines to Understand Us”).
According to Baidu, Duer will mine meaning from written information on the Web. Baidu will collect information about a restaurant, for example, and Duer will infer whether it is pet-friendly or has outdoor seating. In contrast, most voice apps simply tap into conventional search engines, which do not try to extract meaning from information online.
Andrew Ng, chief scientist for Baidu Research in Silicon Valley, and an expert in the field of deep learning, has said that recent advances will soon enable far more capable and smarter forms of voice control, and that this will enable a new age of computer interaction.
Other companies are also pushing aggressively into voice-mediated computing. With more users expected to turn to voice interaction, many tech companies hope to provide capable voice services in order to gain a competitive advantage, or at least to not fall behind their rivals.
The U.S. companies Apple, Google, and Microsoft all include voice-controlled assistants in their smartphone operating systems. And in November of last year, the U.S. e-commerce giant Amazon launched a device for the home called Echo that includes a voice persona called Alexa. At launch, the Echo could be used to look up information from the Web, play podcasts or music from a user’s Amazon library, and add items to a shopping list.
Amazon released an application programming interface for the Echo earlier this year, allowing developers to connect the device to outside apps or services, thus giving it new skills. It also announced $100 million in funding for startups working on voice services to connect them with the Echo.
Matt Lease, an associate professor at the University of Texas, Austin, who specializes in parsing language using computers, says voice interfaces are advancing thanks to fundamental progress in areas such as deep learning combined with the ubiquity of portable devices, which have made people more familiar with voice control. “I don’t think there’s a huge, fundamental breakthrough,” Lease says. “But I’m more comfortable talking to my phone and I’m more comfortable talking to this thing in my living room.”
Toronto wants to kill the smart city forever
The city wants to get right what Sidewalk Labs got so wrong.
Chinese gamers are using a Steam wallpaper app to get porn past the censors
Wallpaper Engine has become a haven for ingenious Chinese users who use it to smuggle adult content as desktop wallpaper. But how long can it last?
Yann LeCun has a bold new vision for the future of AI
One of the godfathers of deep learning pulls together old ideas to sketch out a fresh path for AI, but raises as many questions as he answers.
The US military wants to understand the most important software on Earth
Open-source code runs on every computer on the planet—and keeps America’s critical infrastructure going. DARPA is worried about how well it can be trusted
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.