Researchers at the Palo Alto Research Center (PARC) are developing new ways to deal with the torrent of information flowing from social media sites like Twitter. They have developed a Twitter “topic browser” that extracts meaning from the posts in a user’s timeline. This could help users scan through thousands of tweets quickly, and the underlying technology could also offer novel ways of mining Twitter for information or for creating targeted advertising.
The researchers’ idea was to provide a way for users to deal with a large number of Twitter messages quickly. They found that many users wanted to be able to quickly catch up on what’s been going on, without having to go through every single tweet in their timeline.
Ed Chi, area manager and principal scientist for the Augmented Social Cognition Research Group at PARC, says that the information coming through Twitter resembles a stream–users will dip into it from time to time, but they don’t want to consume it all at once. His group’s work is called the “Eddi Project” in reference to the idea of eddies in a stream.
The researchers developed two main ways of filtering Twitter content. The first, presented recently at the ACM Conference on Human Factors in Computing Systems in Atlanta, is a recommendation system that ranks which posts in a Twitter stream a user is likely to find most interesting, based on factors such as the contents of posts as well as his interactions with other Twitter users. The second tool, the Twitter topic browser, summarizes the contents of a user’s timeline so that the user can quickly survey what information has come through Twitter without having to read through every post.
To create this second tool, the researchers focused on identifying the topic of each tweet. Michael Bernstein, a researcher at the Computer Science and Artificial Intelligence Lab at MIT who is involved with the project, says the group found that Twitter users were interested in filtering posts relating to specific topics, and said they found existing methods lacking. “Hashtags”–user-generated annotations that categorize tweets–are perhaps the best current option, but most tweets don’t have these tags. Bernstein notes that Twitter, Google, and other companies are developing ways to identify and categorize the most popular topics of discussion on Twitter–such the Icelandic volcano. But the sheer volume of tweets provides a lot of information for algorithms to use; it’s much harder, he says, to figure out the topic of tweets that are more unique.
A key challenge of extracting meaning from a tweet is its length: no more than 140 characters. Chi says that most natural language processing technology relies on having a larger sample of text to work with. For example, some methods rely on people writing out associations between terms, which requires a lot of work to maintain, and is not the best way to interpret real-time information.