Hello,

We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Intelligent Machines

More-Accurate Video Search

Speech-recognition software could improve video search.

Boston-based startup EveryZing has launched a search engine that it hopes will change the way that people search for audio and video online. Formerly known as PodZinger, a podcast search engine, EveryZing is leveraging speech systems developed by technology company BBN that can convert spoken words into searchable text with about 80 percent accuracy. This bests other commercially available systems, says EveryZing CEO Tom Wilde.

Audio cues: A new video and audio search engine can convert audio into a text transcript with 80 percent accuracy. That’s good enough to show snippets of the transcript, direct users to the spot in the file where a search term appears, and summarize key concepts.

This high accuracy is enabling new search capabilities, Wilde says, such as the ability to provide entire transcripts of video and audio, and the ability to direct people to the exact spot in a file where a word or phrase is spoken. The technology will also let the company provide targeted ads associated with specific content, much in the way that Google provides ads based on the text of a Web page.

“The big challenge [in online video and audio] … is the opaqueness of media content,” says Wilde. It’s extremely difficult to know what range of content is inside a video or audio clip. “The problem we want to solve,” he says, “is the discoverability of multimedia within Web search.” EveryZing does this by extracting the content of multimedia files and outputting text so that it can take advantage of the preexisting text-search tools developed by the likes of Google and Yahoo.

The Web is exploding with multimedia from YouTube, podcasts, TV news reports, and National Public Radio shows. But it’s still difficult to search for “Barack Obama” and pull up all the instances on the Web in which his name is mentioned. Typically, the titles of clips and the tags that people assign to them don’t contain enough information to give useful search results. And this is why a handful of companies over the past couple of years are exploring using audio content as a guide. For instance, video search engine Blinkx uses speech-recognition technology to scour the entire Web for relevant content, aggregating it on a single site, much as Google aggregates Web pages. (See “Surfing TV on the Internet.”)

EveryZing’s business goals differ from Blinkx’s, says Wilde, and he suspects that the two approaches can complement each other. “We’re about merchandising content, not trolling the Web,” he says. EveryZing (which, like Blinkx, provides a search portal for Web surfers) mainly wants to partner with content providers to make their multimedia searchable. For instance, the company wants to convert all the audio and video content within ABC.com into searchable text, adding time stamps to that text (as well as preexisting closed-captioned text) so a person can immediately jump to a specific word in a clip.

In addition, unlike Blinkx’s current technology, BBN’s technology lets EveryZing extract high-level concepts that originally might not have been searched for. If someone searched for “Barack Obama,” for instance, EveryZing might also offer other keywords in the clip, such as “rally.”

The idea of using audio transcripts to search for multimedia has been around in research labs for decades, and basic speech-recognition research dates back even earlier. Much of the seminal work occurred at BBN, MIT, Carnegie Mellon University, IBM, and SRI International. In 1995, Carnegie Mellon had a working demonstration of a similar video search system, says Richard Stern, professor of electrical and computer engineering at the university. This system, called Informedia, spurred other research in the field, he says, and was the precursor to BBN’s modern video analysis approach.

EveryZing’s underlying technology is composed of two basic technologies from Boston-based BBN. The core speech-to-text system, called Byblos, has been funded by $50 million of research money based on a series of government grants over the past five years, says Wilde. Using probabilistic machine learning algorithms, the system takes one minute to convert each minute of audio content into text.

The second part of the technology, says Wilde, is the algorithms that process the content of the text. BBN’s natural language technology contains huge stores of phrases and words for context, which helps it make sense of a video. For instance, a news segment about health might use language that’s specific to the medical field. In this case, the system would be able to recognize certain obscure words. Understanding the meaning of the text is a powerful tool, says Wilde, because it lets EveryZing provide high-level concepts to users so that they can fine-tune their search. And importantly, it enables the company to pair targeted ads with the right content.

The time is right for a video search engine with these capabilities, says Carnegie Mellon’s Stern. “Video is a much more compelling and entertaining medium than just plain text,” he says, and now so much of it is available on the Internet. He adds that BBN’s 80 percent accuracy is “really quite a feat,” and it should be adequate for searching the troves of content online.

While the technology is good, it’s not perfect, says EveryZing’s Wilde. The accuracy drops when background music is present and if there are multiple people talking at once. But for the infotainment and news market that the company is targeting right now, the technology should offer a significant improvement over what’s currently available, he says. “I think we’ll look back in a couple of years and say, ‘Of course the content of multimedia files needs to be searchable,’” says Wilde. “It’d be as if the Web pages could only be searched by title and tag.”

Hear more about speech recognition at EmTech MIT 2017.

Register now

Uh oh–you've read all of your free articles for this month.

Insider Premium
$179.95/yr US PRICE

More from Intelligent Machines

Artificial intelligence and robots are transforming how we work and live.

Want more award-winning journalism? Subscribe and become an Insider.
  • Insider Premium {! insider.prices.premium !}*

    {! insider.display.menuOptionsLabel !}

    Our award winning magazine, unlimited access to our story archive, special discounts to MIT Technology Review Events, and exclusive content.

    See details+

    What's Included

    Bimonthly magazine delivery and unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Access to the magazine PDF archive—thousands of articles going back to 1899 at your fingertips

    Special discounts to select partner offerings

    Discount to MIT Technology Review events

    Ad-free web experience

    First Look: exclusive early access to important stories, before they’re available to anyone else

    Insider Conversations: listen in on in-depth calls between our editors and today’s thought leaders

  • Insider Plus {! insider.prices.plus !}* Best Value

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus ad-free web experience, select discounts to partner offerings and MIT Technology Review events

    See details+

    What's Included

    Bimonthly magazine delivery and unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Access to the magazine PDF archive—thousands of articles going back to 1899 at your fingertips

    Special discounts to select partner offerings

    Discount to MIT Technology Review events

    Ad-free web experience

  • Insider Basic {! insider.prices.basic !}*

    {! insider.display.menuOptionsLabel !}

    Six issues of our award winning magazine and daily delivery of The Download, our newsletter of what’s important in technology and innovation.

    See details+

    What's Included

    Bimonthly magazine delivery and unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

/
You've read all of your free articles this month. This is your last free article this month. You've read of free articles this month. or  for unlimited online access.