Hello,

We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

  • Flickr | msantos7
  • Connectivity

    For Disposable Voice Recognition, Take Cheap Chips and Add Simple AI

    Okay, Google: throw yourself in the trash.

    Pete Warden wants you to throw your voice-recognition hardware in the trash. And then buy more—and more, and more. This Google engineer is on a quest to make voice recognition dirt cheap.

    His idea is simple enough: cut down the neural networks that are usually used to process sound until they’re efficient enough to run on cheap, lightweight chips. “What I want is a 50-cent chip that can do simple voice recognition and run for a year on a coin battery,” he explained during last week’s Arm Research Summit in Cambridge, U.K. “We’re not there yet … but I really think this is doable with even the current technology that we have now.”

    At such a low price, the hardware would effectively become disposable, opening up uses that have previously been unimaginable. The devices could be used to build cheap dolls that respond to your kids, for instance, or simple home electronics like lamps that are voice-activated. But Warden also says they could find a use in industrial settings, listening for noises rather than voices—hundreds of sensors spotting tell-tale audio signatures of squeaking wheels in factory equipment, or chirping crickets in a farm field.

    Warden, who leads the team at Google that’s developing mobile and embedded applications for the firm’s cloud AI tool, called TensorFlow, realizes that he’s set himself a challenge. Squeezing down, say, the AI that powers Amazon’s AI assistant, Alexa, to run on simple battery-powered chips with clock speeds of just hundreds of megahertz isn’t feasible. That’s partly because Alexa has to interpret a lot of different sounds, but also because most voice recognition AIs use neural networks that are resource-hungry, which is why Alexa sends its processing to the cloud.

    So he’s constrained the problem, seeking to identify just a handful of useful commands—such as “on,” “off,” “start,” “stop,” and so on. He’s also traded in regular speech-recognition algorithms. Instead, he takes an audio clip, slices it into short snippets, and then calculates the frequency content of each one. He lines up each of the frequency plots one after the other to create a 2-D image of frequency content versus time, and applies visual-recognition algorithms to identify the distinctive signature of someone saying a single word.

    Subscribe to The Download
    What's important in technology and innovation, delivered to you every day.
    Manage your newsletter preferences

    The team’s first attempts required eight million calculations to analyze a one-second clip of audio with 89 percent accuracy. That could run on a modern smartphone and be fast enough to be interactive—which is better than having to send the processing to the cloud—but it wouldn’t perform well on a low-power chip. After the team borrowed algorithmic tricks that help Android phones recognize the phrase “OK, Google,” the system was able to analyze a second of speech with 85 percent accuracy by performing just 750,000 calculations.

    The team has published its code on the TensorFlow website for other people to use. Currently it runs the software on the chips like those used in smartphones and Raspberry Pis, the ultra-cheap computer-on-a-card. It plans to try to make them work on the smaller chips like those found in Arduino boards.

    Tony Robinson, a former AI researcher at Cambridge University, U.K., and now chief technical officer at speech-recognition firm Speechmatics, says that Warden’s ambition is a good one, and believes that such low-cost approaches will help voice recognition become pervasive in the coming years. But he sees a problem with building such limited AIs. “People don’t stick to the script,” he says, explaining that users are unlikely to be patient enough to make use of such a highly constrained set of instructions.

    Instead, he suggests that slightly high-power chips that can summon more of the linguistic capabilities of the kind found in Google Assistant and Amazon's Alexa may be better suited to consumer applications.

    Hear more about artificial intelligence at EmTech MIT 2017.

    Register now

    Uh oh–you've read all of your free articles for this month.

    Insider Premium
    $179.95/yr US PRICE

    More from Connectivity

    What it means to be constantly connected with each other and vast sources of information.

    Want more award-winning journalism? Subscribe to Insider Plus.
    • Insider Plus {! insider.prices.plus !}*

      {! insider.display.menuOptionsLabel !}

      Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

      See details+

      What's Included

      Unlimited 24/7 access to MIT Technology Review’s website

      The Download: our daily newsletter of what's important in technology and innovation

      Bimonthly print magazine (6 issues per year)

      Bimonthly digital/PDF edition

      Access to the magazine PDF archive—thousands of articles going back to 1899 at your fingertips

      Special interest publications

      Discount to MIT Technology Review events

      Special discounts to select partner offerings

      Ad-free web experience

    /
    You've read all of your free articles this month. This is your last free article this month. You've read of free articles this month. or  for unlimited online access.