One way for social networks to make money is by sharing information about users with advertisers and others who are interested in understanding consumer behavior and exploiting online trends.
Social networks typically promise to remove “personally identifying information” before sharing this data, to protect users’ privacy. But researchers from the University of Texas at Austin have found that, combined with readily available data from other online sources, this anonymized data can still reveal sensitive information about users.
In tests involving the photo-sharing site Flickr and the microblogging service Twitter, the Texas researchers were able to identify a third of the users with accounts on both sites simply by searching for recognizable patterns in anonymized network data. Both Twitter and Flickr display user information publicly, so the researchers anonymized much of the data in order to test their algorithms.
The researchers wanted to see if they could extract sensitive information about individuals using just the connections between users, even if almost all of the names, addresses, and other forms of personally identifying information had been removed. They found that they could, provided they could compare these patterns with those from another social-network graph where some user information was accessible.
Data from social networks–particularly the pattern of friendship between users–can be valuable to advertisers, says Vitaly Shmatikov, a professor of computer science at the University of Texas at Austin, who was involved in the research. Most social networks plan to make money by sharing this information, while advertisers hope to employ it to, for example, find a particularly influential user and target her with advertising to reach her network of friends. But Shmatikov says that this information also makes networks vulnerable. “When you release this data, you have to preserve the structure of the social network,” he says. “If you don’t, then probably it’s useless for the purpose for which you are releasing it.”
The researchers say that it is fairly easy to find nonanonymous social-network data: the connections between friends in many networks, such as Twitter, are made public by default. Meanwhile, efforts to create a universal “social graph,” such as with OpenSocial, provide even more resources. The researchers’ algorithms worked with only a 12 percent error rate even when the patterns of social connections were significantly different: only 14 percent of users’ relationships overlapped from Twitter to Flickr. The results are described in a paper to be presented later this month at the IEEE Symposium on Security and Privacy.
“The structure of the network around you is so rich, and there are so many different possibilities, that even though you have millions of people participating in the network, we all end up with different networks around us,” says Shmatikov. “Once you deal with sufficiently sophisticated human behavior, whether you’re talking about purchases people make or movies they view or–in this case–friends they make and how they behave socially, people tend to be fairly unique. Every person does a few quirky, individual things which end up being strongly identifying.”