Skip to Content
Uncategorized

Say Hello, or 你好, to China’s Siri

YuDian, dubbed the Chinese Siri, is available to anyone with a mobile phone in China.
November 16, 2012

You might not have heard of iFlyTek. The company is hardly a household name in its domestic market of China, either. But it has a vice-like grip on over 80 percent of the speech technology market in the People’s Republic, heading an ecosystem of over 10,000 partners and developers and with user numbers in the hundreds of millions.

The company was founded in 1999 by Liu Qingfeng and five other students from the University of Science and Technology of China, widely recognized as one of the nation’s preëminent research institutions. They took advantage of research conducted at the university’s National Intelligent Computer R&D Center and the Human-Machine Speech Communication Laboratory.

iFlyTek has developed a series of text-to-speech (TTS) products and, more recently, speech recognition and the cloud-based speech tech platform, called Voice Cloud. Over the years, accuracy rates steadily improved and the technology found its way onto an ever-wider range of products.

Its Voice Cloud platform supports speech recognition, speech synthesis, and speech-input applications, and as of March of this year, speech understanding and “iFlyTek YuDian”—dubbed the Chinese Siri—to anyone with an Internet-enabled mobile phone.

IHS iSuppli analyst Ian Fogg says the voice technology market is still fairly immature, but providers including Google and Nuance have taken big strides in recent years by using the power of the cloud to help process speech-recognition queries. “The old model was a voice-recognition application like Dragon Naturally Speaking, but the drawback was, you had to train it to your voice,” he says.

Liu admits that iFlyTek has “the same core technology” as Nuance and Google, but he claims that its open platform and native Chinese expertise make it superior to U.S.-made products.

The Voice Cloud platform has grown more accurate as its user numbers have risen from one million in the first half of 2011 to a whopping 100 million today, he says.

“We are in a more advantageous position in comparison with other competitors in terms of the Chinese language and the China market because we have a larger amount of speech data, which is updated daily by 100 million users of our speech cloud,” Liu says.

Chinese is particularly difficult to get right in speech recognition and synthesis because of the tonal nature of the language. That is, the same word can mean different things depending on the tone in which it is spoken—the word gau in Cantonese has a variety of meanings, including “nine” and “dog,” for example.

iFlyTek, therefore, created a ”two-stream model system” that separates the tonal information from the spectrum to improve accuracy, according to research manager Hu Yu. The firm also claims to have innovative systems that can understand speech even in noisy conditions or on distorted channels.

Liu argues that one of iFlyTek’s key assets is its willingness to work with research bodies outside the company, including USTC, Tsing Hua University, and the Institute of Linguistics at the Chinese Academy of Social Sciences.

“Speech technology is a typical interdisciplinary subject, which involves many different subjects, such as computer science, acoustics, phonetics, linguistics, and so on,” he says. “With the support of Ministry of Science and Technology of China, we built the National Engineering Lab for Speech and Language Information Processing, which is the only national-level laboratory in the speech and language field.”

Finally, its extensive network of 10,000 partners and developers has ensured the tech is available via a wide variety of devices, channels, and applications. The world’s largest network operator, China Mobile, recently took up a 15 percent share in the firm, further expanding its reach.

These partnerships have also helped the Siri-like YuDian to better serve its user base by providing it with “rich information resources,” including entertainment, travel, and ticketing information, he says.

The company’s technology could soon be taking commands in markets outside of China. For those skeptical about the English language ability of a home-grown Chinese company, Liu is keen to reel off an impressive list of awards for iFlyTek technology.

It won the international English TTS competition Blizzard Challenge for seven consecutive years from 2006 to 2012 and first place in the U.S. NIST speaker recognition evaluation in 2008 and 2010, and first in the NIST language recognition evaluation in 2009 and 2010. Liu says the firm has also “successfully completed” R&D on several other languages, including Japanese, Korean, French, Spanish, and Russian.

With China now representing the world’s largest smartphone market, there’s clearly huge potential for growth.

However, Daniel Hong, lead analyst at Ovum, says iFlyTek “seems to be focused on its fairly large captive market, where there is still a lot of room to grow.” IHS analyst Fogg also believes that the sheer size of the China market could distract the firm in the short term.

Mark Natkin, founder of Beijing-based tech consultancy Marbridge Consulting, was more upbeat. “I think if it can take advantage of the unique characteristics of the Chinese market and gain traction, then it already has a certain momentum, which can often parlay into expansion overseas,” he says.

ABI Research senior analyst Michael Morgan observes that with a focus on China, iFlyTek has so far been able to avoid intellectual property-related legal conflict with the likes of Nuance. “Over the long term, iFlyTek could take advantage of its local language expertise to establish relationships with local handset OEMs and establish a business that is large enough to go global,” Morgan says.

With Huawei, ZTE, and Lenovo already established as partners, that global push could be sooner than expected as these local handset giants look to grow their businesses abroad.

Gartner believes that by 2020, voice will be the interaction channel between humans and computers in 50 percent of all Web and mobile customer service interactions.

Keep Reading

Most Popular

Scientists are finding signals of long covid in blood. They could lead to new treatments.

Faults in a certain part of the immune system might be at the root of some long covid cases, new research suggests.

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.