How to Use Twitter Friends as Sensors to Detect Disease Outbreaks

Emerging Technology from the arXivarchive page

November 30, 2012

The Twittersphere is a remarkable lens through which to examine the activities of humankind. Indeed, there is intense interest in studying the way information flows through the Twitter network.

Various groups have looked at how they can use this information to measure and even predict public opinion about everything from movies to the stock market.

But all this work shares a common problem: the sheer volume of data that Twitter generates. It’s simply not possible to follow what everybody is doing all the time.

Today, Manuel Garcia-Herranz at the Autonomous University of Madrid in Spain and a few pals say there’s a better way to track the spread of information on Twitter that is much more powerful.

Their approach is to use a small group of highly connected Twitter users as “sensors” to detect the emergence of new ideas. They point out that this works because highly connected individuals are more likely to receive new ideas before ordinary users.

To test this idea, they pick two different groups of Twitter users and monitor when each group begins to use new hashtags about specific topics. The clever thing is how they choose these groups.

The first is simply a subset of Twitter users chosen at random. This is the control group.

They then choose a follower, or “friend,” from each of these Twitter users and put them together to form a second group, which they call the sensor group.

It’s easy to imagine that these two groups would be more or less the same, but that’s not the case. People in the sensor group are significantly better connected than people chosen at random. That’s simply because highly connected Twitter users are more likely than poorly connected ones to be linked to random individuals.

As the researchers put it: “Your friends have more friends than you do!”

The question that these guys then ask is whether the sensor group becomes aware of new hashtags earlier than the control—a question they call the sensor hypothesis.

To find the answer, they crunched six months of Twitter data dating from 2009. This involved 40 million users who together totted up 1.5 billion “follows” and sent nearly half a billion tweets, including 67 million containing hashtags. Garcia-Herranz and co looked, in particular, at 24 hashtags that were “born” shortly after the data set began.

Sure enough, the sensor group detected these new hashtags about seven days earlier than the control group. In fact, the lead time varied between nothing at all and as much as 20 days.

That’s an important result. Garcia-Herranz and pals say that not only is using a sensor group more efficient than monitoring all Twitter users, it is more effective, too. In other words, there’s no point in crunching these huge data sets. You’re far better off picking a decent sensor group and watching them instead.

It’ll come as no surprise that this approach has numerous applications. For example, using sensor groups to monitor public mood could give important insight into factors that influence economic growth, elections, and even political revolutions.

Public-health monitoring may also be an important application. “Google or other companies that monitor flu-related search terms might be able to get high-quality, real-time information about a real-world epidemic with greater lead time, giving public-health officials even more time to plan a response,” say Garcia-Herranz and co.

In other words, your friends could act as an early warning system, not just for gossip, but for civil unrest and even outbreaks of disease.

There are potential problems, of course. One is that individuals in the sensor group automatically become targets for lobbying. That could distort the measurement process and also raises various ethical questions about undue influence.

A difficult challenge will be to find ways to protect these individuals and keep their identities hidden, perhaps even from themselves.

Nevertheless, this is significant work. Clearly these guys have discovered a powerful tool that is likely to generate significant interest. Expect to hear more about it.

Ref: arxiv.org/abs/1211.6512: Using Friends as Sensors to Detect Global-Scale Contagious Outbreaks

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.