The Virtual Twitterverse that Can Forecast the Real Thing

Researchers have built a synthetic network that can recreate the behavior of Twitterverse.

Emerging Technology from the arXivarchive page

February 7, 2011

The Twitter social network has expanded exponentially since its creation in 2006. According to one survey last year, 190 million Tweeters now send an average of 65 million tweets every day.

This, in turn, has generated intense interest from researchers wanting to study this and other social networks as well as from entrepreneurs wanting to exploit them.

But that presents a problem. The data from these kinds of social networks is hard to gather. That’s partly for practical reasons such as their huge size and limits that networks like Twitter place on the amount of data that can be gathered. And partly due to legal and privacy rules which prevent crawlers gathering certain types of info.

Now Vijay Erramilli and buddies at Telefonica Research in Spain have a solution. They’ve created a virtual Twitterverse that has all the characteristics of the real thing without any actual tweeters. They call this virtual world SONG (Social Network Write Generator).

The idea is to use the virtual Twitterverse to create massive datasets that researchers can then use to study how various scenarios might play out in the real world.

Of course, the crucial feature of this device is that it must accurately represent the real Twitterverse. They built their model by studying a huge dataset of tweeting activity gathered between 25 Nov and 4 December 2008. The resulting social graph has over 2 million nodes and more than 38 million edges.

Erramilli and co then extracted various features such as diurnal variations, the distribution of activity between users and so on. They then created a virtual network with synthetic tweeters that reproduces these characteristics.

To test it, they ran this virtual Twitterverse on a network of 16 pentium class machines connected via a 1G ethernet switch and fine tuned it to have the same properties as the real thing. They then compared the activity it produces with real world data.

They say it reproduces the characteristics with in certain limits. One bottleneck, for example, is the CPU when activity rises above 100 tweets per second, which can presumably be fixed or avoided.

One important question is whether this virtual Twitterverse captures all the important characteristics of the real network. That’s hard to say for certain. But it certainly seems capable of recreating at least some aspects of it, and that may be all that’s needed for many potential users.

Many groups are likely to be interested in using a virtual Twitterverse. Erramilli and co say it can be used to analyse the capacity of parts of a network and to benchmark its performance.

But it’s the ability to forecast tweeting activity and the effect of things like flash mobbing that is likely to generate the most interest.

Many marketing groups would give their right arms to know how they should best design their campaigns to maximise the spread of their message. And that also suggests a way for entrepreneurs to exploit the technology.

They may not have long to wait. Erramilli and co intend to release code for SONG in the near future so that anybody can study these kinds of what if scenarios.

Of course, if SONG is ever widely used it will begin to influence the dynamics of the Twitterverse itself. In which case, these guys will need a way to model that feedback mechanism too. If that happened, SONG could become an important part of the dynamic of the Twitterverse.

Stranger things have happened. Who would have imagined that an ordinary web search engine could become powerful enough to influence the way people design and build web pages? It’s unlikely that SONG will ever have such influence. But something like it could.

Ref: arxiv.org/abs/1102.0699: Explore What-If Scenarios Wwith SONG: Social Network Write Generator

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.