Software is constantly trying to figure out what you mean, and often it guesses wrong. If you were curious about employment at the maker of iPods and typed “turnover at apple” into Google, the top results would be for apple turnover recipes.
It’s not just search engines—the same problem crops up in software that aims to translate, recognize speech, analyze the mood surrounding a product launch, or deliver targeted advertisements.
A startup called Idilia, based in Montreal, Canada, has built software to make all these applications better at what they do. The software focuses on the problem of word-sense disambiguation—choosing the meaning of a word based on what makes the most sense in context. Word-sense disambiguation is an old artificial intelligence problem that has proved thorny over the decades. For a computer to apply a word correctly in context, it has to have a huge amount of background information—not just what’s in a dictionary but also a map of how words fit together both grammatically and conceptually.
Matthew Colledge, Idilia’s CEO, became obsessed with the problem more than a decade ago. “For me, it was the reason we can’t get a computer to think,” Colledge says. He founded Idilia in 2000 and has since spent more than $30 million of public and private investments to build the company’s software.
What makes word-sense disambiguation worth tackling now, he says, is that processing power has increased sufficiently to make a difference. It’s now possible to store enough information to train algorithms, and to run a lot of algorithms to analyze sentences and expect them to finish within a reasonable period of time.
Idilia runs many algorithms in parallel. For example, one of its algorithms determines what meaning for a word is statistically most likely, while another watches for how commonly a word has a given meaning in the context of the words around it, and still another analyzes grammatical structure to figure out what role the word plays in a sentence. A “super-algorithm” then weighs these various results and selects a meaning based on them all.
Colledge demonstrates the software with the example “Was Martha Stewart framed?” Though one of Idilia’s algorithms determines that “frame” is most likely statistically to refer to a picture frame, another algorithm rejects this notion by identifying that “Martha Stewart” is a person, and people don’t get framed like pictures. The super-algorithm eliminates one meaning after another, and finally settles on “framed” as in “entrapped.”
Adding this intelligence to a search engine, Colledge says, could greatly improve the quality of its results. He shows that the query about Martha Stewart brings up a bunch of results about crafts and picture frames in Google. A plug-in that Idilia has built to demonstrate its capability does much better. It analyzes the query and feeds Google the paraphrase instead, throws away any results that don’t fit the context, and produces a list of sites addressing Stewart’s troubles with the law.