Hello,

We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Intelligent Machines

AI Has Beaten Humans at Lip-reading

A pair of new studies show that a machine can understand what you’re saying without hearing a sound.

Lip-reading is notoriously difficult, depending as much on context and knowledge of language as it does on visual clues. But researchers are showing that machine learning can be used to discern speech from silent video clips more effectively than professional lip-readers can.

In one project, a team from the University of Oxford’s Department of Computer Science has developed a new artificial-intelligence system called LipNet. As Quartz reported, its system was built on a data set known as GRID, which is made up of well-lit, face-forward clips of people reading three-second sentences. Each sentence is based on a string of words that follow the same pattern.

The team used that data set to train a neural network, similar to the kind often used to perform speech recognition. In this case, though, the neural network identifies variations in mouth shape over time, learning to link that information to an explanation of what’s being said. The AI doesn’t analyze the footage in snatches but considers the whole thing, enabling it to gain an understanding of context from the sentence being analyzed. That’s important, because there are fewer mouth shapes than there are sounds produced by the human voice.

When tested, the system was able to identify 93.4 percent of words correctly. Human lip-reading volunteers asked to perform the same tasks identified just 52.3 percent of words correctly.

But as New Scientist reports, another team from Oxford’s Department of Engineering Science, which has been working with Google DeepMind, has bitten off a rather more difficult task. Instead of using a neat and consistent data set like GRID, it’s been using a series of 100,000 video clips taken from BBC television. These videos have a much broader range of language, with far more variation in lighting and head positions.

Using a similar approach, the Oxford and DeepMind team managed to create an AI that was able to identify 46.8 percent of all words correctly. That’s also far better than humans, who recorded just 12.4 percent of words without a mistake. There are clearly lots of reasons why the accuracy is lower, from lighting and orientation to the greater language complexity.

Differences aside, both experiments show AI vastly outperforming humans at lip-reading, and it’s not hard to imagine potential applications for such software. In the future, Skype could fill in the gaps when a caller is in a noisy environment, say, or people with hearing difficulties could hold their smartphone up to “hear” what someone is saying.

(Read more: Quartz, New Scientist, Oxford Machine Learning Reading Group, arXiv, “The Challenges and Threats of Automated Lip Reading”)

Hear more about artificial intelligence at EmTech MIT 2017.

Register now

Uh oh–you've read all of your free articles for this month.

Insider Premium
$179.95/yr US PRICE

More from Intelligent Machines

Artificial intelligence and robots are transforming how we work and live.

Want more award-winning journalism? Subscribe and become an Insider.
  • Insider Premium {! insider.prices.premium !}*

    {! insider.display.menuOptionsLabel !}

    Our award winning magazine, unlimited access to our story archive, special discounts to MIT Technology Review Events, and exclusive content.

    See details+

    What's Included

    Bimonthly magazine delivery and unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Access to the magazine PDF archive—thousands of articles going back to 1899 at your fingertips

    Special discounts to select partner offerings

    Discount to MIT Technology Review events

    Ad-free web experience

    First Look: exclusive early access to important stories, before they’re available to anyone else

    Insider Conversations: listen in on in-depth calls between our editors and today’s thought leaders

  • Insider Plus {! insider.prices.plus !}* Best Value

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus ad-free web experience, select discounts to partner offerings and MIT Technology Review events

    See details+

    What's Included

    Bimonthly magazine delivery and unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Access to the magazine PDF archive—thousands of articles going back to 1899 at your fingertips

    Special discounts to select partner offerings

    Discount to MIT Technology Review events

    Ad-free web experience

  • Insider Basic {! insider.prices.basic !}*

    {! insider.display.menuOptionsLabel !}

    Six issues of our award winning magazine and daily delivery of The Download, our newsletter of what’s important in technology and innovation.

    See details+

    What's Included

    Bimonthly magazine delivery and unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

/
You've read all of your free articles this month. This is your last free article this month. You've read of free articles this month. or  for unlimited online access.