Some Web searches are easy to think of and describe, but complicated to conduct. If, for instance, you want to find “a nonstop flight from Las Vegas to San Diego next week on JetBlue,” you have to fill out a bevy of fields on a travel site.
SkyPhrase, a startup created by Nick Cassimatis, an associate professor at Rensselaer Polytechnic Institute, will soon offer software that lets companies turn natural language questions like the one above into a format that their databases can handle.
Facebook’s new search tool, Graph Search, highlights both the progress that’s being made in natural language processing and the difficulties that remain. Unlike the old search bar, Graph Search lets users enter queries as they might speak them. And yet, this is still limited to a fairly small range of query types (see “Facebook’s Graph Search Won’t Hurt Google Without Your Help”).
While natural language processing typically involves teaching software vocabulary and grammatical rules or using statistical analysis, SkyPhrase’s technology uses a combination of algorithms and data structures.
The company plans to release a website in February, as well as an extension for the Chrome Web browser, that would allow users of Google Analytics to filter information using natural language queries.
Cassimatis says this could make it easier to find patterns or data points that are otherwise time-consuming for even a seasoned user to unearth. Eventually, the company hopes to offer SkyPhrase as a programming interface for other websites.
Natural language search could make it easier for people with little or no training to ask complicated questions that typically require searching through large stores of data. Besides Facebook’s efforts, Wolfram Alpha can answer natural language queries, and Apple’s virtual assistant Siri also encourages users to speak questions aloud in complete sentences (some of its answers come from Wolfram Alpha).
Percy Liang, an assistant professor in Stanford’s computer science department who studies natural language processing and machine learning, thinks SkyPhrase’s idea is a good one, but cautions that there is a lot of work to be done to make natural language processing work. He says challenges include determining what a word or phrase means based on its context—such as knowing that Obama is not just the U.S. president’s name, but also a city in Japan—and the ability to pick up on sentiment—such as a Yelp restaurant review that says, sarcastically, “I had to wait an hour for this!”
Users also need to know what they can do with the system. If there’s a big gap between a user’s expectations and the system’s abilities, the user will get disappointed, Liang says. Facebook’s Graph Search, for example, tries to solve this problem by auto-completing queries so users know what kinds of things it can search for. Likewise, Siri tries to answer only a limited range of queries.
SkyPhrase is still clearly in its infancy. I tried a version that can conduct complex natural language searches of Twitter, Gmail, Orbitz, and Amazon’s MP3 store. It couldn’t understand a number of my queries—not infrequently because I was trying to get it to do things it hasn’t yet been trained to do—but it did a decent job of searching through Gmail, and understanding some complex queries about e-mails I needed to find as I organized an upcoming trip.
SkyPhrase understands only searches for complete words (a search for simply “banana” won’t bring up an e-mail mentioning the game Bananagrams) and doesn’t infer meaning from words (it won’t pull up messages or tweets with words related to the words you’re searching for). But it can understand conjunctions—I could search for, say, “e-mails from Bob Loblaw in December and January about recipes with a PDF,” or “e-mails from Bob Loblaw or Tobias Funke about cookies in December.”
The company is also building a programming interface that Cassimatis says will make it easy for third parties to offer natural language search capabilities to their own services—which could allow people without an artificial intelligence or linguistics background to quickly perform complex analyses of data that typically take a trained employee a lot of time.
Cassimatis envisions this API being used to squeeze insights from everything from financial information to sports scores. Eventually, it could also serve as a revenue source if the company charges developers to license it.