We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not a subscriber? Subscribe now for unlimited access to online articles.

Business Impact

Where Siri Has Trouble Hearing, a Crowd of Humans Could Help

A program called Scribe harnesses humans on the Internet to generate speech captions in under five seconds.

Crowdsourced real-time captioning could assist deaf people and improve voice recognition technologies.

Computer scientist Jeffrey Bigham has created a speech-recognition program that combines the best talents of machines and people.

Though voice recognition programs like Apple’s Siri and Nuance’s Dragon are quite good at hearing familiar voices and clearly dictated words, the technology still can’t reliably caption events that present new speakers, accents, phrases, and background noises. People are pretty good at understanding words in such situations, but most of us aren’t fast enough to transcribe the text in real time (that’s why professional stenographers can charge more than $100 an hour). So Bigham’s program Scribe augments fast computers with accurate humans in hopes of churning out captions and transcripts quickly.

This rapid-fire crowd-computing experiment could be a big help for deaf and hearing-impaired people. It also could also provide new ways to enhance voice recognition applications like Siri in areas where they struggle.

Scribe’s algorithms direct human workers to type out fragments of what they hear in a speech. By turning up the volume or slowing down the speed of slices of the audio, the program can direct different workers to unique but overlapping sections of a speech and then give them a few seconds to recover before asking them to type again.

Using natural-language processing algorithms, Scribe strings together the typed-out fragments into a complete transcript, and the redundant overlaps can help it weed out errors. (This shotgun computing technique is similar to the way many DNA sequencing machines work, Bigham points out.) It can produce a transcript or caption with a delay as short as three seconds using just three to five workers.

The only requirement is that the workers can hear and type, so even as a group, they cost less than a stenographer and don’t need days of advance notice, he notes. That could be a big help for a deaf student who wants to, say, take a new online class that hasn’t been captioned.

Bigham (see “Innovators Under 35, 2009: Jeffrey Bigham”) and his University of Rochester colleague Walter Lasecki have tested Scribe with laborers they found through Amazon’s Mechanical Turk, where people sign up to perform simple tasks. Those workers were paid a minimum of $6 an hour by Bigham’s team. The team also hired undergraduate work-study students for $10 an hour. The crowdsourced work of people in both groups appears to be only slightly less accurate than that of a professional stenographer, Bigham says. And in some cases, the pooled workers more accurately transcribed jargon terms that a single professional typist might mishear.

“What Scribe is starting to show is the ability to work together as part of a crowd to do very difficult performance tasks better than a person can do alone,” he says.

Bigham is now developing Scribe into an app that he hopes could help deaf people crowdsource transcripts quickly. To support a large number of users, he is also considering licensing the technology or spinning off a startup.

It’s not the first time someone has thought of using cheap, computer-coӧrdinated human labor to bolster the traditional weaknesses in artificial intelligence programs or other software. Twitter is hiring people on Mechanical Turk to help its search engine classify newsy topics that suddenly start trending. Bigham also has created a crowdsourced personal-assistance system called Chorus (see “Artificial Intelligence, Powered By Many Humans”) that could be smarter than Siri but cheaper than any individual hourly employee.

This is not to say that human labor will always outperform automated systems at transcribing speech. Aditya Parameswaran, a researcher at Stanford University who also works on human-assisted computation methods, says that as learning algorithms improve, crowdsourcing techniques like these will be useful mostly for augmenting the computers’ accuracy, rather than for having humans do the bulk of the work.

AI is here. Will you lead or follow? Countdown to EmTech Digital 2019 has begun.

Register now
More from Business Impact

How technology advances are changing the economy and providing new opportunities in many industries.

Want more award-winning journalism? Subscribe to Print + All Access Digital.
  • Print + All Access Digital {! insider.prices.print_digital !}*

    {! insider.display.menuOptionsLabel !}

    The best of MIT Technology Review in print and online, plus unlimited access to our online archive, an ad-free web experience, discounts to MIT Technology Review events, and The Download delivered to your email in-box each weekday.

    See details+

    12-month subscription

    Unlimited access to all our daily online news and feature stories

    6 bi-monthly issues of print + digital magazine

    10% discount to MIT Technology Review events

    Access to entire PDF magazine archive dating back to 1899

    Ad-free website experience

    The Download: newsletter delivered daily

You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.