How to Predict the Spread of News on Twitter

Computer scientists have discovered the four factors that make news stories popular on Twitter.

Emerging Technology from the arXivarchive page

February 7, 2012

Twitter has revolutionised the way millions of people receive news and the type of news they get. So it’s no surprise that there is huge interest in predicting what kind of stories are likely to spread furthest and fastest.

One way to make this kind of prediction is to study how a story spreads soon after it is released into the wild. Various groups have shown that this early popularity can be a good predictor of a story’s later spread.

A couple of years ago, Bernardo Huberman and pals at HP’s Social Computing Lab in Palo Alto used this approach to predict the eventual box office revenues based on the rate of tweets about a film soon after it was released.

The problem with this method is that the structure of the network can have a profound effect on the way tweets spread and this has little to do with the content and its appeal.

So Huberman is now taking another approach. This time he wants to know whether there is something about the news stories themselves that determine their popularity. In other words, he’s looking for factors that determine how popular a news story will be before it is even published.

To find out, Huberman and his colleagues examined the content of news stories during a single week in August last year as measured by the news feed aggregator Feedzilla. They scored each article based on four criteria: the news source that generates and posts the article; the category of news; the subjectivity of the language; and the people and things named in the article.

They then measured the way these news stories spread across the Twitter network to see which became popular and how quickly. They used this to work out how an article’s score in each criterion is linked to its eventual popularity

Finally, having worked out what factors make an article successful, they used this to predict how popular other articles would be.

Here’s their conclusion: “Our experiments show that it is possible to estimate ranges of popularity with an overall accuracy of 84% considering only content features.”

So before anybody lays eyes on these articles, it’s possible to work out in advance how popular they are likely to become.

That’s pretty impressive and may herald important changes in the way articles are written and edited. It’s not hard to imagine an automated article checker—rather like the grammar checkers in word processing programs–that reads articles and predicts how popular they are likely to be when published.

In a sense, that’s what journalists do now when they choose topics to write about. But this process is entirely intuitive, based as much on gut feel as on a good understanding of the dynamics of the audience. Huberman’s algorithm could automate this process.

That would have profound effects on the generation of news stories. On the one hand, it could lead to the homogenisation of stories as news organisations focus on optimising their stories for this algorithm.

Exactly that process happened in Hollywood a few years ago when story telling became homogenised in the manner outlined by Robin Mckee in his highly successful Story seminars.

On the other hand, automation could lead to a new generation of more tightly written and better focused stories that build on the new algorithm and better it.

Interesting times. One way or another, the way we produce written content is changing. And rapidly.

Ref: arxiv.org/abs/1202.0332: The Pulse of News in Social Media: Forecasting Popularity

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.