“The global human population can be regarded as geographically distributed, multimodal sensors,” say Siqi Zhao at Rice University, Houston, and a few pals.
And if that’s the case, the Twitter firehose presents the “readings” from these sensors.
We’ve known for some time that this sensor system provides real-time updates about major events such as earthquakes, forest fires and celebrity deaths.
But what of more frequent, rapidly changing events, ask Zhao and co.
To find out, these guys collected tweets during the game time of 101 American football matches in the 2010-2011 season. That’s a total of 19 million tweets from 3.5 million tweeters.
Making sense of these tweets is no mean feat. First, Zhao and buddies had to separate the football-related tweets from the rest. That’s tricky given that only 11 pet cent of tweets contain hashtags indicating their topic.
Then they had to work out which game each tweet referred to. Again this can be difficult when up to 10 games might be playing simultaneously, although Zhao and co say that 60 per cent of game-related tweets contain team names.
Next they had to work out when an ‘event’ actually occurred; things like touchdowns, interceptions, fumbles and field goals. This they do with a two-stage process that looks for these keywords and measures the rate at which they appear within a given time window. If the post rate rises above some predetermined threshold, then the system decides that this event has occurred.
Finally, the system has to do all this in real time from a firehose of up to 800 tweets per second
It turns out that with the right kind of filtering, Twitter can provide a remarkably accurate commentary, accurate to within a few seconds. Zhao and co say that on average tweeters take 17 seconds to report a game event.
Curiously, their system worked well on all the football games they monitored, except one: the Super Bowl itself.
That’s because the sheer number of tweets about this game seemed to saturate Twitter’s ability to distribute them. So Zhao and co were unable to see increases in the rate at which keywords appeared.
Other than that, these guys seemed to have hit upon a great way to create real time commentaries. “Most of the techniques can be readily applied to many other sports games,” they say. Although these games would require a similarly sized fan base. Soccer and baseball are obvious candidates and an automated sports commentary start up can’t be far behind.
However, the technique has an important limitation. It only works for events in which keywords are known in advance–things like ‘goal’ or ‘home run’ and so on. When unanticipated events occur, the system is oblivious.
That suggests an obvious line of future research: to find a way to recognise important but unanticipated events. We’ll look forward to seeing what Zhao and co come up with.
Ref: arxiv.org/abs/1106.4300: Humans as Real-Time Sensors of Social and Physical Events: A Case Study of Twitter and Sports Games