Christopher Mims

A View from Christopher Mims

Automated Processing of Wikileaks Cables Reveals U.S. Friends, Foes

Natural Language Processing of nearly 4,000 U.S. diplomatic cables reveals fraying relations with traditional allies, and a few other surprises

  • April 11, 2011

Software capable of determining the positive or negative sentiment of sentences written by humans has been unleashed on 3,891 U.S. diplomatic cables released by WikiLeaks, and the results are a systematic, if preliminary, analysis of which countries are our besties and which are in the doghouse.

The analysis was part of a class project (pdf) by a pair of computer science undergraduates at Stanford, Xuwen Cao and Beyang Li. By looking at how often a country was mentioned, as well as whether or not it was cast in a positive or negative light, Cao and Li identified four clusters to which countries could belong: countries we don’t like that aren’t mentioned very often (red), countries we sort-of don’t like that aren’t mentioned very often (teal), and countries spoken of positively that also aren’t mentioned very often (blue).

Since these cables were supposed to be classified, we can assume they are candid. There weren’t any countries that were mentioned frequently in a negative or especially positive light – just countries that were groused about fairly frequently (green).

Here’s a further breakdown of what each cluster represents:

Green locations (cast in a somewhat negative light, and frequently):

[‘london’, ‘paris’, ‘cuba’, ‘africa’, ‘brasilia’, ‘cairo’, ‘eu’, ‘brazil’,
‘afghanistan’, ‘egypt’, ‘europe’, ‘iran’, ‘china’, ‘iraq’, ‘libya’, ‘syria’,
‘pakistan’, ‘washington’, ‘turkey’, ‘israel’, ‘moscow’, ‘spain’, ‘uk’, ‘russia’,
‘madrid’, ‘india’, ‘tripoli’, ‘kabul’, ‘iceland’, ‘france’]

Red locations (countries talked about infrequently, and in the most negative context):

[‘djibouti’, ‘taiwan’, ‘tajikistan’, ‘islam’, ‘mumbai’, ‘zimbabwe’, ‘dubai’, ‘goa’,
‘tibet’, ‘armenia’, ‘yar’, ‘ecuador’, ‘benghazi’, ‘algiers’, ‘yemen’, ‘paraguay’,
‘caracas’, ‘south africa’, ‘ouagadougou’, ‘xxxxxxxxxxxx’, ‘guinea’]

(It’s worth noting that due to the nature of natural language processing, a country like Taiwan could be mentioned in the context of negative sentiment about its context, and not the country itself – e.g. the cross-strait tensions with mainland China.)

Teal locations (mentioned in a somewhat negative context, but relatively infrequently):

[‘kosovo’, ‘north korea’, ‘damascus’, ‘argentina’, ‘latin america’, ‘netherlands’,
‘uruzgan’, ‘switzerland’, ‘reykjavik’, ‘lebanon’, ‘qatar’, ‘sudan’, ‘somalia’,
‘venezuela’, ‘guantanamo’, ‘colombia’, ‘sao paulo’, ‘saudi arabia’, ‘america’,
‘peru’, ‘gaza’, ‘bolivia’, ‘ukraine’, ‘geneva’, ‘jordan’, ‘tehran’, ‘georgia’,
‘sweden’, ‘portugal’, ‘mexico’, ‘lula’, ‘kenya’, ‘italy’, ‘ethiopia’, ‘canada’,
‘germany’, ‘havana’, ‘algeria’]

Blue locations (mentioned in the most positive context, but not very often):

[‘azerbaijan’, ‘japan’, ‘chechnya’, ‘norway’, ‘australia’, ‘ankara’, ‘baghdad’,
‘poland’, ‘haiti’, ‘kazakhstan’, ‘honduras’, ‘belgrade’, ‘copenhagen’, ‘kuwait’,
‘karzai’, ‘amazon’, ‘burma’, ‘tunisia’, ‘west bank’, ‘doha’, ‘west’, ‘new york’,
‘nigeria’, ‘serbia’, ‘darfur’, ‘chile’, ‘morocco’, ‘vatican’, ‘uae’, ‘new delhi’,
‘middle east’, ‘brussels’]

Here’s what the authors say about the seeming outliers in the blue group:

The blue cluster has the highest sentiment score, which means that US is relatively happy with this group. As one may notice, there are a few notable anomalies such as ‘burma’ and ‘sudan’. In the case of ‘burma’, the positive sentiment is mainly caused by Aung San Suu Kyi’s release from house arrest from mutliple cables. In the case of ‘sudan’, it’s also a special case because the darfur cables discuss mostly the international help darfur received, instead of it’s dire situation.

And here are the findings the authors found most interesting:

Given our model, we made a few interesting discoveries:
1. In general, the US diplomats are critical of other countries, as we observe the majority of the data points is in the negative
2. Surprisingly, US’s most important ally is spain (seen lower right quardrant)
3. US is most friendly with Norway (right-most point), although it’s relatively unimportant
4. Iran appeared most frequently, with a small negative sentiment (which means the attitude is not always hostile)
5. US is least happy with Zimbabwe and Paraguay, although it doesn’t care too much about them either
6. US doesn’t actually have good relations with its traditional allies such as France, UK and Germany. Canada, Italy and Germany even scored lower than China.

Number six is a zinger. It’s a stretch to say that cables that talk about our traditional allies in a negative light indicate that we have poor relations with them – maybe we have good relations, and that means we’re more willing to be critical, the way siblings are wont to fight. It’s also important to note that these results, which aren’t peer reviewed, are just a first approximation of what a full-fledged Natural Language Processing analysis of these cables would look like.

As much as this study says something about the nature of diplomacy, it’s possible it says something more about the nature of gossip: good news is never as important as news of what’s going wrong.

Follow Mims on Twitter or contact him via email.

Uh oh–you've read all five of your free articles for this month.

Insider Online Only

$19.95/yr US PRICE


From the latest smartphones to advances in quantum computing, the hardware behind today's digital age is rapidly changing.

You've read of free articles this month.