To deliver useful search returns from the so-called real-time Web–such as seconds-old Twitter “tweets” reporting traffic jams–Google has adapted its page-ranking technology and developed new algorithmic tricks and filters to keep returns relevant, according to a leading Google engineer.
Google rolled out real-time search technology last month, to offer searchers access to brand-new blog posts and news items far faster than the five to 15 minutes it previously took Google’s Web crawlers to discover newly created items.
Bing, Cuil, and other search engines also provide various kinds of real-time results. Both Google and Bing have also forged major deals with Twitter to get real-time access to tweets, those 140-character microblog posts sent out by Twitter members. But Google claims to offer the most comprehensive real-time results by scanning news headlines, blogs, and feeds from Facebook, MySpace, Twitter, and other sources.
The tweets are a mainstay of Google’s real-time results, but Google has not previously discussed how it ranks them. A fundamental Google strategy for identifying tweet relevance is analogous to that used by Google’s PageRank technology, which helps find relevant Web pages with traditional Web search. Under PageRank, Google judges the importance of pages containing a given search keyword in part by looking at the pages’ link structure. The more pages that link to a page–and the more pages linking to the linkers–the more relevant the original page.
In the case of tweets, the key is to identify “reputed followers,” says Amit Singhal, a Google Fellow, who led development of real-time search. (Twitterers “follow” the comments of other Twitterers they’ve selected, and are themselves “followed.”)
“You earn reputation, and then you give reputation. If lots of people follow you, and then you follow someone–then even though this [new person] does not have lots of followers,” his tweet is deemed valuable because his followers are themselves followed widely, Singhal says. It is “definitely, definitely” more than a popularity contest, he adds.
“One user following another in social media is analogous to one page linking to another on the Web. Both are a form of recommendation,” Singhal says. “As high-quality pages link to another page on the Web, the quality of the linked-to page goes up. Likewise, in social media, as established users follow another user, the quality of the followed user goes up as well.”
But Google’s social-ranking tricks are hardly the only method the search giant uses to extract relevance from tweets. Google also developed new ways to choose which (if any) tweets to surface for common terms like “Obama”–and to avoid spam or low-quality tweets–all within seconds.