A View from Emerging Technology from the arXiv
Algorithm Writes People’s Life Histories Using Twitter Stream
If you tweet about your life, a new algorithm can identify your most significant events and assemble them into an accurate life history, say the computer scientists who built it
Twitter allows anyone to describe their life in unprecedented detail. Many accounts provide an ongoing commentary of an individual’s interests, activities and opinions.
So it’s not hard to imagine that it’s possible to reconstruct a person’s life history by analysing their Twitter stream.
But doing this automatically is trickier than it sounds. That’s because most Twitter streams contain news of important events mixed up with entirely trivial details about events of little or no significance. The difficulty is in telling these apart.
Today, Jiwei Li at Carnegie Mellon University in Pittsburgh and Claire Cardie at Cornell University in Ithaca say they’ve developed an algorithm that does this. Their new technique can create an accurate life history for any individual by mining their tweets and those of their followers. That allows them to generate an eerily accurate chronology of a person’s life-changing events, without knowing anything about them other than their twitter handle.
The key behind this work is a technique for separating the wheat from the chaff in any twitter stream. Li and Cardie do this classifying every tweet in one of four categories. The most important tweets are those that describe important, time specific events of a personal nature.
A tweet about starting a new job would be a good example. By contrast, a tweet about a 5 kilometre run that is part of a regular exercise regime would not qualify because it happens regularly. So personal events fall into two categories–time specific and time general.
Equally, tweets about other non-personal events fall into a similar two categories–time specific and time general. A tweet about the US election would be an example of the former whereas an opinion about the summer weather would be an example of the latter.
The problem that Li and Cardie have solved is to find a way of automaticallydistinguishing tweets in the first category from the others. The solution is based on the discovery that that the pattern of tweets, retweets and replies varies for each of the categoroies they’ve defined.
For example, a tweet about starting a new job has a different pattern of responses from followers than a tweet about running or the US election or the weather. So the trick is to identify this ‘Twitter signature’ of these important personal events and then mine the twitter stream for other examples. A chronological list of these events is that person’s life history.
At least, that’s the theory. Li and Cardie test their idea by mining the streams of 20 ordinary twitter users and 20 celebrities over a 21 month period from 2011 to 2013. They then asked the ordinary users to create their own life history by manually identifying their most important tweets. For the celebrities, Li and Cardie used Wikipedia biographies and other sources of information to create ‘gold standard’ life histories manually.
Finally, they compared these gold standard life-histories against the ones generated by their algorithm. The results are not bad. The algorithm accurately picks out many important life events that are also identified in the gold standards. “Experiments on real Twitter data quantitatively demonstrate the effectiveness of our method,” they say.
But it is by no means perfect. For example, the technique only works with users who tweet regularly and with enough followers to allow the algorithm to spot the unique pattern of responses that identifies important tweets.
Still, that’s a significant number of people and Lie and Cardie say their technique can be broadly applied. “It can be extended to any individual, (e.g. friend, competitor or movie star), if only he or she has a twitter account,” they add.
Lie and Cardie talk about their future plans in terms of improving the accuracy of their technique. However, they do not talk about making the algorithm more widely available. If it works as well as they imply, there should be no shortage of interested parties wanting to use it.
The ability to mine the twitter firehose for the life histories of the masses will be valuable. Just who might want to use this technique and how, I’ll leave for the comments section below.
The work raises some interesting questions, not least about privacy. Would Individuals think more carefully about placing their life history in the public domain if they knew how easily it could be distilled?
The new technique means that a detailed life history will be available at the touch of a button to friends and family but also to prospective employers, business competitors, the government, the media, law enforcement agencies, stalkers and so on.
What’s clear is that social networks are an important aspect of modern life. What is not yet so clear is just how powerful and revealing they will turn out to be.
Ref:arxiv.org/abs/1309.7313 : Timeline Generation: Tracking individuals on Twitter