We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Emerging Technology from the arXiv

A View from Emerging Technology from the arXiv

How Your Tweets Reveal Your Home Location

IBM researchers have developed an algorithm that predicts your home location using your last 200 tweets.

  • March 21, 2014

One of the optional extras that Twitter allows is for each tweet to be tagged with the user’s location data. That’s useful if you want people to know where you are or so that you can later remember where certain events took place. It also gives researchers a valuable tool for studying the geographical distribution of tweets in various ways.

But it also raises privacy issues, particularly when users are unaware, or forget that, their tweets are geotagged. Various celebrities are thought to have given away their home locations in this way. And in 2007, four Apache helicopters belonging to the US Army were destroyed by mortars in Iraq when insurgents worked out their location using geotagged images published by American soldiers.

Perhaps these kinds of concerns are the reason why so few tweets are geotagged. Several studies have shown that less than one per cent of tweets contain location metadata.

But the absence of geotagging data does not mean your location is secret. Today, Jalal Mahmud and a couple of pals at IBM Research in Almaden, California, say they’ve developed an algorithm that can analyse anybody’s last 200 tweets and determine their home city location with an accuracy of almost 70 per cent.

That could be useful for researchers, journalists, marketers and so on wanting to identify where tweets originate. But it also raises privacy issues for those who would rather their home location remained private.

Mahmud and co’s method is relatively straightforward. Between July and August 2011, they filtered the Twitter firehose for tweets that were geotagged with any of the biggest 100 cities in the US until they had found 100 different users in each location.

They then downloaded the last 200 tweets posted by each user, rejecting those that posted privately. That left them with over 1.5 million geotagged tweets from almost 10,000 people.

They then divided this data set in two, using 90 per cent of the tweets to train their algorithm and the remaining 10 per cent to test it against.

The basic idea behind their algorithm is that tweets contain important information about the probable location of the user. For example, over 100,000 tweets in the dataset were generated by the location-based social networking site Foursquare and so contained a link that gave the exact location. And almost 300,000 tweets contained the name of cities listed in the US Geological Service gazetteer.

Other tweets contained clues to their location like phrases such as “Let’s Go Red Sox”, a reference to the Boston-based baseball team. And Mahmud and co say that distribution of tweets throughout the day is roughly constant across the US, shifted by time zone. So a user’s pattern of tweets throughout the day can give a good indication of which time zone they’re in.

So the question these guys set out to answer was whether it was possible to use this information to predict a user’s home location, a result they could test by matching it against the user’s geotagged metadata.

Mahmud and co used an algorithm known as a Naive Bayes Multimonial to do the number crunching. The trained it by feeding it the training dataset along with the geolocation data.

They then tested the algorithm on the remaining 10 per cent of the data to see whether it could predict the geolocation.

The results are interesting. They say that when they exclude people who are obviously travelling, their algorithm correctly predicts people’s home cities 68 per cent of the time, their home state 70 per cent of the time and their time zone 80 per cent of the time. And they say their algorithm takes less than a second to do this for any individual

That could be a useful tool. Journalists, for example, could use it to determine which tweets were coming from a region involved in a crisis, such as an earthquake, and those that were just commenting from afar.  Marketers might use it to work out the popularity of their products in certain cities.

And it also suggests ways that people can improve their privacy–by not mentioning their home location, of course.

Mahmud and co say their algorithm could do better in future. For example, they think they can get more fine-grained detail by searching tweets for mentions of local landmarks that can be pinpointed more accurately. Whether that turns out to be possible, we’ll have to wait and see.

An interesting corollary to all this is that our notion of privacy is more fragile than most of us realise. Just how we can strengthen and protect it should be the subject of considerable public debate.

Ref: arxiv.org/abs/1403.2345 : Home Location Identification of Twitter Users

Want to go ad free? No ad blockers needed.

Become an Insider
Already an Insider? Log in.
More from Connectivity

What it means to be constantly connected with each other and vast sources of information.

Want more award-winning journalism? Subscribe to Insider Plus.
  • Insider Plus {! insider.prices.plus !}*

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

    See details+

    Print + Digital Magazine (6 bi-monthly issues)

    Unlimited online access including all articles, multimedia, and more

    The Download newsletter with top tech stories delivered daily to your inbox

    Technology Review PDF magazine archive, including articles, images, and covers dating back to 1899

    10% Discount to MIT Technology Review events and MIT Press

    Ad-free website experience

You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.