Audio cues: A new video and audio search engine can convert audio into a text transcript with 80 percent accuracy. That’s good enough to show snippets of the transcript, direct users to the spot in the file where a search term appears, and summarize key concepts.
everyzing

Computing

More-Accurate Video Search

Speech-recognition software could improve video search.

  • Tuesday, June 12, 2007
  • By Kate Greene

Boston-based startup EveryZing has launched a search engine that it hopes will change the way that people search for audio and video online. Formerly known as PodZinger, a podcast search engine, EveryZing is leveraging speech systems developed by technology company BBN that can convert spoken words into searchable text with about 80 percent accuracy. This bests other commercially available systems, says EveryZing CEO Tom Wilde.

This high accuracy is enabling new search capabilities, Wilde says, such as the ability to provide entire transcripts of video and audio, and the ability to direct people to the exact spot in a file where a word or phrase is spoken. The technology will also let the company provide targeted ads associated with specific content, much in the way that Google provides ads based on the text of a Web page.

"The big challenge [in online video and audio] ... is the opaqueness of media content," says Wilde. It's extremely difficult to know what range of content is inside a video or audio clip. "The problem we want to solve," he says, "is the discoverability of multimedia within Web search." EveryZing does this by extracting the content of multimedia files and outputting text so that it can take advantage of the preexisting text-search tools developed by the likes of Google and Yahoo.

The Web is exploding with multimedia from YouTube, podcasts, TV news reports, and National Public Radio shows. But it's still difficult to search for "Barack Obama" and pull up all the instances on the Web in which his name is mentioned. Typically, the titles of clips and the tags that people assign to them don't contain enough information to give useful search results. And this is why a handful of companies over the past couple of years are exploring using audio content as a guide. For instance, video search engine Blinkx uses speech-recognition technology to scour the entire Web for relevant content, aggregating it on a single site, much as Google aggregates Web pages. (See "Surfing TV on the Internet.")

Advertisement

EveryZing's business goals differ from Blinkx's, says Wilde, and he suspects that the two approaches can complement each other. "We're about merchandising content, not trolling the Web," he says. EveryZing (which, like Blinkx, provides a search portal for Web surfers) mainly wants to partner with content providers to make their multimedia searchable. For instance, the company wants to convert all the audio and video content within ABC.com into searchable text, adding time stamps to that text (as well as preexisting closed-captioned text) so a person can immediately jump to a specific word in a clip.

In addition, unlike Blinkx's current technology, BBN's technology lets EveryZing extract high-level concepts that originally might not have been searched for. If someone searched for "Barack Obama," for instance, EveryZing might also offer other keywords in the clip, such as "rally."

The idea of using audio transcripts to search for multimedia has been around in research labs for decades, and basic speech-recognition research dates back even earlier. Much of the seminal work occurred at BBN, MIT, Carnegie Mellon University, IBM, and SRI International. In 1995, Carnegie Mellon had a working demonstration of a similar video search system, says Richard Stern, professor of electrical and computer engineering at the university. This system, called Informedia, spurred other research in the field, he says, and was the precursor to BBN's modern video analysis approach.

Print

Related Articles

Video Content Search Gets a Boost

Online video search tool scans videos for written words.

Searching Video Lectures

A tool from MIT finds keywords so that students can efficiently review lectures.

Building a Better Search Engine

A new natural-language system is based on 30 years of research at PARC.

To comment, please sign in or register

Forgot my password

Advertisement

MAGAZINE

Can We Build Tomorrow's Breakthroughs?

Manufacturing in the United States is in trouble. That's bad news not just for the country's economy but for the future of innovation.

Sponsored Content

Technologies from National Instruments

Adding Data Logging
Log measured data to a file and open it in Microsoft Excel

> Click here for more National Instruments Videos <
Whitepaper

Temperature Measurements with Thermocouples: How-To Guide

This document is part of the “How-To Guide for Most Common Measurements” centralized resource portal. This tutorial provides a detailed guide for measurement and device considerations to take temperature measurements using thermocouples. Get an introduction to thermocouples, which are inexpensive sensing devices widely used with PC-based data acquisition systems. Also review some specific thermocouple examples and learn how thermocouples work and ways to integrate them into a data acquisition measurement system.

View full PDF > Listen to story >
Find us on Youtube

Videos

A Robot Recruit that Can Do It All

More

Advertisement

Technology Review Lists

TR50

Our list of the 50 most innovative companies, including the following:

IBM

Twitter

Joule Unlimited

Complete Genomics

More

Advertisement

Facebook

Advertisement