First Measurement of ‘Wordquakes’ Shaking the Blogosphere

Certain words disrupt the blogosphere in the same way that earthquakes shake the planet. And that makes them ripe for an earthquake-like magnitude rating.

Emerging Technology from the arXivarchive page

February 14, 2011

In 2007, when the search engine Technorati stopped counting, over 100 million blogs had appeared on the web. In a little over ten years, blogs have changed the nature of publishing.

So it’s no surprise that blogs have become the focus of intense study by scientists hoping to gain some insight into the nature of the creatures that produce them. (A blog, of course, is a web page with entries listed in reverse chronological order, maintained by one or more writers.)

Today, Peter Klimek at the Medical University of Vienna and a couple of buddies say there is a remarkable analogy between the way topics erupt into the blogosphere and how earthquakes rupture the planet.

These guys studied over 160 political blogs published in the US between 1 July 2008 and 3 May 2010. Each day, they counted the number of occurrences of every possible letter triplet ie aaa, aab, aac…zzz. (There are some 26^3=17576 triplets but more than half of these never occur.)

They then looked for the day on which each triplet was most common and listed the words in which they occurred. They then searched their database for occurrences of these words for the 30 days before and after the peak.

Klimek and co say this clearly shows two types of event. The first is a sudden spike in word frequency triggered by a news event such as the nomination of Sarah Palin as vice presidential candidate. Because these events are triggered from outside the blogosphere, Klimek and co call them exogenous

The second was gradual spike in which the discussion within the blogosphere reaches a crescendo and then dies away again. The use of the word inauguration before and after the inauguration of President Obama is an example, which Klimek and co call endogenous because they arise within the blogosphere.

The main finding is that the distribution of event sizes and of fore-and after shocks is remarkably similar to those found by seismologists. “The intensity of fore- and aftershocks follows Omori’s law, the distribution of event-sizes is of Gutenberg-Richter type,” say Klimek and co.

During the 670 days they were monitoring these bogs, they found over 1000 events, more than one wordquake per day.

In some ways, that’s not surprising. Word frequencies in most languages are known to follow power law distributions similar to those that govern earthquakes. Most of these studies have been done on snapshots of the language, the corpus of words in Wikipedia or the words in the complete works of certain authors for example. What Klimek and co are looking at is the way this measure changes in time.

What’s more interesting is the possibility of grading wordquakes in real time using a magnitude system, just as seismologist do for earthquakes.

“One might also think of a ‘Richter scale’ for media events,” say Klimek and co. They say largest event in their dataset is the nomination of Sarah Palin as vice presidential candidate.

This would be equivalent to the Big One hitting San Francisco or Tokyo. “Indeed, aftershocks of this event are still trembling and quivering through our society,” says Klimek and co.

Wordquakes given a magnitude might have exotic properties because of feedback effects. The very fact that a media event was labelled a Big One would generate interest that makes the quake even bigger.

This may or may not make wordquakes qualitatively different from earthquakes; we’ll have to see.

And a magnitude rating has further potential. Many human activities are known to follow earthquake-like power laws–epidemics, wars and fashions to name just a few.

Using the blogosphere, and indeed the Twitterverse, to give a magnitude to these trends might turn out to be a useful and popular way of rating them.

An interesting project for an innovative web start up.

Ref: arxiv.org/abs/1102.2091: The Blogosphere As An Excitable Social Medium: Richter’s And Omori’s Law In Media Coverage

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.