How Twitter Could Better Predict Disease Outbreaks

Social media is particularly useful for anyone who wants to track the present–or predict the future.

Christopher Mimsarchive page

July 14, 2010

A growing body of literature suggests that the data people make public on the Web can be used to track epidemics, predict box office hits and foretell other aspects of the future. Adding to this evidence, Vasileios Lampos, Tijl De Bie and Nello Cristianini of the Intelligent Systems Laboratory at the University of Bristol (UK) have released a paper about the utility of Twitter for tracking flu outbreaks.

The work builds on research pioneered in 2008 by scientists at Google that resulted in Google.org’s Flu Trends. (To see it in action, check out today’s elevated Flu incidence in South Africa)

What’s different about tapping social media instead of search queries, says senior author Nello Cristianini, is that individual tweets are qualitatively different from search strings, which tend to be quite short.

“There is the potential in Twitter to understand the contents of the text, and isolate specific self-diagnostic statements by the user, for example, “i am having a headache,” says Cristianini.

Over several months, the researchers were able to gather a database of over 50 million geo-located tweets which could then be compared to official data from the U.K.’s national health service on flu incidence by region. By figuring out which keywords in the database of tweets were associated with elevated levels of flu, Lampos et al. were able to create a predictive model that transformed keyword incidence in future tweets into a prediction of the severity of flu for a given area.

This flu-predicting signal from Twitter is an independent stream of information that can “complement or improve the signal coming from search engine queries,” says Cristianini.

Cristianini notes that all approaches that track self-reported symptoms suffer from the same bias, however: the more the media hypes a Flu epidemic, the more likely people are to go to their doctors (distorting the “official” numbers) and talk about suspicious symptoms on Twitter or other services.

Future work might involve information from Facebook and other sources of status updates, allowing researchers to become ever more adept at pinpointing outbreaks in their earliest stages.

Follow Christopher Mims on Twitter, or contact him via email.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.