Back in time: The software tools can produce a visualization showing Twitter topics over time. The lower half of this image shows recommended tweets.
The researchers realized, however, that search engines have been dealing with extracting meaning from a small number of words–in the form of search queries–for years.
“The essence of the approach is to coerce a tweet to look more like a search query and then get a search engine to tell us more,” Bernstein says. The researchers first clean up a tweet by pulling common terms, like the Twitter slang “RT,” which means “retweet.” Once their algorithms have focused on likely significant terms, they feed those into the Yahoo’s Build your Own Search Service interface–a Web service that can be used to tap directly into Yahoo’s search result.
The Web is the most up-to-date source of data, Bernstein says, and the pages that come up in search results give enough information for the researchers’ algorithms to produce a list of topics related to the original tweet.
A similar approach could be used with any repository of information, Chi notes, pointing out that companies could use the technology on an intranet to classify bits of information related to more specialized topics.
“Boosting the signal of a tweet by piping it through web search is an application of a well-established information-retrieval technique,” says Daniel Tunkelang, an engineer at Google who is an expert on information retrieval. He compares it to using a thesaurus to set a word in a broader context.
However, Tunkelang says the PARC researchers will have to make sure that the tweet-as-search-query approach doesn’t collide with search engines’ increasing efforts to index tweets. It wouldn’t be good for a tweet to return itself as a result.
Chi says that his team is working on a platform for managing various kinds of information streams. This summer, they plan to increase the scale of the Eddi Project so it can be placed on the live Web for testing. The longer-term goal, Chi says, is to build tools that can be optimized for enterprise customers.