Boston-based startup EveryZing has launched a search engine that it hopes will change the way that people search for audio and video online. Formerly known as PodZinger, a podcast search engine, EveryZing is leveraging speech systems developed by technology company BBN that can convert spoken words into searchable text with about 80 percent accuracy. This bests other commercially available systems, says EveryZing CEO Tom Wilde.
This high accuracy is enabling new search capabilities, Wilde says, such as the ability to provide entire transcripts of video and audio, and the ability to direct people to the exact spot in a file where a word or phrase is spoken. The technology will also let the company provide targeted ads associated with specific content, much in the way that Google provides ads based on the text of a Web page.
“The big challenge [in online video and audio] … is the opaqueness of media content,” says Wilde. It’s extremely difficult to know what range of content is inside a video or audio clip. “The problem we want to solve,” he says, “is the discoverability of multimedia within Web search.” EveryZing does this by extracting the content of multimedia files and outputting text so that it can take advantage of the preexisting text-search tools developed by the likes of Google and Yahoo.
The Web is exploding with multimedia from YouTube, podcasts, TV news reports, and National Public Radio shows. But it’s still difficult to search for “Barack Obama” and pull up all the instances on the Web in which his name is mentioned. Typically, the titles of clips and the tags that people assign to them don’t contain enough information to give useful search results. And this is why a handful of companies over the past couple of years are exploring using audio content as a guide. For instance, video search engine Blinkx uses speech-recognition technology to scour the entire Web for relevant content, aggregating it on a single site, much as Google aggregates Web pages. (See “Surfing TV on the Internet.”)
EveryZing’s business goals differ from Blinkx’s, says Wilde, and he suspects that the two approaches can complement each other. “We’re about merchandising content, not trolling the Web,” he says. EveryZing (which, like Blinkx, provides a search portal for Web surfers) mainly wants to partner with content providers to make their multimedia searchable. For instance, the company wants to convert all the audio and video content within ABC.com into searchable text, adding time stamps to that text (as well as preexisting closed-captioned text) so a person can immediately jump to a specific word in a clip.
In addition, unlike Blinkx’s current technology, BBN’s technology lets EveryZing extract high-level concepts that originally might not have been searched for. If someone searched for “Barack Obama,” for instance, EveryZing might also offer other keywords in the clip, such as “rally.”
The idea of using audio transcripts to search for multimedia has been around in research labs for decades, and basic speech-recognition research dates back even earlier. Much of the seminal work occurred at BBN, MIT, Carnegie Mellon University, IBM, and SRI International. In 1995, Carnegie Mellon had a working demonstration of a similar video search system, says Richard Stern, professor of electrical and computer engineering at the university. This system, called Informedia, spurred other research in the field, he says, and was the precursor to BBN’s modern video analysis approach.