How do you parse a tweet? Five years ago, that question would have been gibberish. Today, it’s perfectly sensible, and it’s at the front of Amit Singhal’s mind. Singhal is leading Google’s quest to incorporate new data into search results in real time by tracking and ranking updates to online content–particularly the thousands of messages that course through social networks every second.
Real-time search is a response to a fundamental shift in the way people use the Web. People used to visit a page, click a link, and visit another page. Now they spend a lot of time monitoring streams of data–tweets, status updates, headlines–from services like Facebook and Twitter, as well as from blogs and news outlets.
Ephemeral info-nuggets are the Web’s new currency, and sifting through them for useful information is a challenge for search engines. Its most daunting aspect, according to Singhal, is not collecting the data. Facebook and Twitter are happy to sell access to their data feeds–or “fire hoses,” as they call them–directly to search providers; the information pours straight into Google’s computers.
What’s really hard about real-time search is figuring out the meaning and value of those fleeting bits of information. The challenge goes beyond filtering out spam, though that’s an important part of it. People who search real-time data want the same quality, authority, and relevance that they expect when they perform traditional Web searches. Nobody wants to drink straight from a fire hose.
Google dominates traditional search by meticulously tracking links to a page and other signals of its value as they accumulate over time. But for real-time search, this doesn’t work. Social-networking messages can lose their value within minutes of being written. Google has to gauge their worth in seconds, or even microseconds.
Google is notoriously tight-lipped about its search algorithms, but Singhal explains a few of the variables the company uses to analyze what he calls “chatter.” Some are straightforward. A Twitter user who attracts many followers, and whose tweets are often “retweeted” by other users, can generally be assumed to have more authority. Similarly, Facebook users gain authority as their friends multiply, particularly if those friends also have many friends.
Other signals are more subtle. A sudden spike in the prevalence of a word in a message stream–earthquake, say–may indicate an important event. If a message on a commonly discussed topic includes unusual phrasing, that may signal new information or a fresh insight. Google, says Singhal, continuously scans for shifts in language and other deviations from predicted behavior.
The company is also working to connect message content to the geolocation data that’s transmitted by smart phones and other mobile computers, or broadcast through services like Foursquare. The location of someone sending a message can matter a great deal. If you know that a person tweeting about an earthquake is close to the epicenter, chances are those tweets will be more valuable than those of someone hundreds of miles away.
Singhal’s view of real-time search is very much in line with Google’s strategy: distilling from a welter of data the few pieces of content that are most relevant to an individual searcher at a particular point in time. Other search providers, including Google’s arch rival, Microsoft, are taking a more radical view.
Sean Suchter, who runs Microsoft’s Search Technology Center in Mountain View, CA, doesn’t like the term real-time search, which he considers too limiting. He thinks Microsoft’s Bing search engine should not just filter data flowing from social networks but become an extension of them.
Ultimately, says Suchter, one-on-one conversations will take place within Bing, triggered by the keywords people enter. Real-time search, he predicts, will be so different from what came before that it will erase Google’s long-standing advantages. “History doesn’t matter here,” he says. After a pause, he adds, “We’re going to wipe the floor with them.”
Amit Singhal has heard such threats before, and so far they haven’t amounted to much. But even he admits that real-time search comes as close to marking “a radical break” in the history of search as anything he’s seen. Keeping Google on top in the age of chatter may prove to be Singhal’s toughest test.