TR Editors' blog

Twitter Opens Up More of Its Data

A partnership with social media company Gnip made the move possible.

Erica Naone 11/17/2010

  • 1 Comment

Researchers and companies who want to track the conversations going on online are intensely interested in data from Twitter. It's been hard to get deep access to that information, however. Onstage today at Defrag, a Web conference in Denver, Colorado, Twitter announced that it's formed a partnership to make more of its data available for analysis.

Ryan Sarver, a member of Twitter's platform team, said that the move is aimed at helping people who are analyzing huge bodies of Twitter posts in order to perform sentiment analysis, identify trends, and other sorts of data-intensive tasks. "We haven't been able to serve that market well in the past," Sarver said.

Twitter already let people pick up portions of its data for free through several partial feeds, such as the Spritzer, which skims a portion of the posts moving through Twitter at any given moment and passes them on. Before today's announcement, however, those wanting more had to make deals with Twitter to get more data. Google and Bing, for example, made special agreements to incorporate real-time feeds from Twitter on its search results page.

That data hasn't been readily available for several reasons. First, it's valuable and makes up some portion of Twitter's business model. Second, Twitter already struggles with overload and wouldn't be able to handle constant requests for its full feed.

Twitter will open up more of its data through a partnership with Gnip, a social data company based in Boulder, Colorado. Gnip will help Twitter distribute the information, minimizing the stress that this places on Twitter's resources. Twitter is also granting Gnip a license to sell the data.

Gnip is starting out by offering three new feeds: the Twitter halfhose, which gives 50 percent of the full Twitter firehose, the Twitter Decahose, which is 10 percent of the full Twitter stream, and the Mentionhose, which is a full real-time stream of all tweets mentioning a user, including replies and retweets.

"We will provide more transparent, consistent access to Twitter data than has ever been available before," said Gnip CEO Jud Valeski. He says that all of these new offerings give much more data than was previously available to most people. He expects the Mentionhose to be particularly interesting to companies tracking trends, looking for influential people on Twitter, and monitoring engagement with a product.

Valeski said, "There is insatiable demand for lots of data to understand how conversations online are taking place and transpiring."

What Twitter Learns from All Those Tweets

The company's head of analytics explains how Twitter mines the data users produce.

Erica Naone 09/28/2010

  • 3 Comments

Twitter messages might be limited to 140 characters each, but all those characters can add up. In fact, they add up to 12 terabytes of data every day.

"That would translate to four petabytes a year, if we weren't growing," said Kevin Weil, Twitter's analytics lead, speaking at the Web 2.0 Expo in New York. Weil estimated that users would generate 450 gigabytes during his talk. "You guys generate a lot of data."

This wealth of information seems overwhelming but Twitter believes it contains a lot of insights that could be useful to it as a business. For example, Weil said the company tracks when users shift from posting infrequently to becoming regular participants, and looks for features that might have influenced the change. The company has also determined that users who access the service from mobile devices typically become much more engaged with the site. Weil noted that this supports the push to offer Twitter applications for Android phones, iPhones, Blackberries, and iPads. And Weil said Twitter will be watching closely to see if the new design of its website increases engagement as much as the company hopes it will.

This visualization shows the connections between users.
Credit: Phillie Casablanca

Of course, Twitter also tracks simple statistics, such as how many searches are being performed on its site and where users are located, as well as what domains users link to most frequently. But Weil says the company uses machine learning techniques to figure out what kinds of tweets resonate most with users (this is reposted, automatically, through its "TopTweets" account).

Twitter is also asking some more open-ended questions. Weil said the company is interested in what influences retweets (posts from one user that are reposted by another). And Twitter has discovered that it can make good guesses about the topics a user is interested in by looking at the users he follows that don't follow him back.

Asking such specific questions of huge quantities of data is a common problem for successful Web companies. Weil explained that Twitter benefits from a variety of open-source software developed by companies such as Google, Yahoo, and Facebook. These tools are designed to deal with storing and processing data that's too voluminous to manage on even the largest single machine.

Even so, Twitter sometimes struggles with not having enough hardware. Weil said the company has run out of space in its data center, and that the 100-machine cluster it currently uses to process data is significantly less powerful than what it really needs. Twitter plans to move to a new data center later this year, and he hopes to get three to four times the capacity there.

Weil also said that Twitter is interested in doing more real-time analysis of tweets, but he didn't give details about how the company plans to mine this new trove of data.

Twitter's New Look

The site is finally changing its design--how will it affect other applications and the company's new advertising platform?

Erica Naone 09/15/2010

Twitter is changing its site to make it easier to use and navigate, and to give more context to posts that people see on its website. The changes will add a variety of informative panels to supplement the rapid exchange of information that's always taking place on the site.

For example, Twitter users often post links to pictures and videos. The new site will pull those onto the page so that people don't have to leave to view that content. Twitter has accomplished this in part through partnerships with companies that provide these services.

The new site will also provide contextual information for tweets, giving users related posts, for example. It will also make it easier to see profile information about who has posted a tweet without navigating away from the page.

Though the individual changes may seem small, together they suggest a focus on the aesthetics of the site that Twitter historically hasn't had time for. The site has spent most of its life trying to keep its head above water and prevent crashes from being too frequent.

The changes also seem likely to make people spend more time on the Twitter home page, rather than navigating away from it constantly. This could affect the group of applications that have grown up to supplement Twitter, in some cases undermining their functionality. It's also possible that the changes will make advertisements on Twitter more valuable. They could be coupled more closely to other content, and there might be more opportunities to present them.

A small group of users already has the new Twitter, and the company expects to add everyone in slowly over the next few weeks.

About

Insights, opinions, and our editors' analysis of the latest in emerging technologies.

Subscribe to the TR Editors' blog RSS Feed

Advertisement
Advertisement

Facebook

Advertisement