Can Social Networks Be Generated Automatically?

When Google launched Buzz, a microblogging social network, several months ago, the company boasted that the network had been generated automatically, by algorithms that could connect users to each other based on communications revealed through Gmail and other services.

**Linked in:** Researchers from Yahoo examined e-mails from a university (top) and from Enron (below); the shape of each network changed a great deal depending on how connections were defined.

However, many users balked at having what they perceived as mischaracterized social connections, forcing the company to frantically backpedal and make the Buzz service less automated and more under users’ control.

This incident notwithstanding, many companies are increasingly interested in automatically determining users’ social ties through e-mail and social network communications. For example, IBM’s Lotus division offers a product called Atlas that constructs social data from corporate communications, and Microsoft has investigated using such data to automatically prioritize the e-mails that workers receive. But researchers say there are a lot of unsolved problems with generating and analyzing social networks based on patterns of communication.

In a paper presented recently at the WWW2010 conference in Raleigh, NC, a group of researchers from Yahoo pointed out that before it’s possible to construct an accurate picture of a social network, researchers have to do a better job of defining what it takes for two people to be connected. Should two people be considered friends if they’ve exchanged e-mails once? Or should it take 10 exchanges before their connection counts?

“You don’t get to directly observe relationships, you get to observe communication events,” says Jake Hofman, a researcher in Yahoo Research’s social dynamics group, who was involved with the work. Algorithms will infer dramatically different social network structures based on different interpretations of these communications events. Such networks might be more suitable for different circumstances. For example, a network based on relatively infrequent communications might turn out to work well for sharing tagged news items. More frequent communications might work better for networks designed for sharing more intimate information.

“For the most part, the thresholds we set [for automatically generating social networks] are arbitrary,” says Lada Adamic, an assistant professor in the School of Information and the Center for the Study of Complex Systems at the University of Michigan. Adamic notes that there are other questions than the ones raised by the Yahoo paper. For instance, she says, most algorithms define networks simplistically–people are either connected or not, without a way to indicate the gray areas common in real life.

She says it’s possible to keep refining the algorithms, but there will always be errors because the data available won’t capture the whole pattern. For example, two people might not e-mail each other, but they may talk regularly over the phone or in person.

**Degrees of separation:** A variety of different social networks can be generated by altering how connections between users are defined.

Incomplete information can throw off attempts to characterize social networks automatically, says Eric Gilbert, who will be an assistant professor of interactive computing at Georgia Tech starting this fall. Algorithms can miss identifying the most intimate connections because these are likely to be face-to-face rather than digital communication–what Gilbert calls it the “spouse problem” or “the roommate problem.”

Gilbert has found that studying the structure of a network in greater detail can compensate for this to a degree. For example, a married couple is likely to share a large number of friends. But he acknowledges that this doesn’t solve the problem altogether.

On the flip side of the spouse problem is “the ex problem,” which was highlighted during the launch of Buzz. This occurs when algorithms connect two people who may have communicated frequently at one point but no longer do, and no longer wish to–such as estranged romantic partners. Gilbert explains that it’s hard to automatically discover an event such as a breakup because of the complex variables that surround it. Two people may stop communicating because one is busy, or on vacation. Algorithms would have to examine and compare complex behavior over time and in the context of other connections to understand this.

Munmun De Choudhury, who was involved with the Yahoo research and now works at Microsoft Research, says that more research can be done to help algorithms better understand the nature of social connections. Frequent e-mails could indicate either a very positive or very negative relationship, for example, and additional analysis might help algorithms identify the difference between the two.

Ultimately, Adamic says, it is a question of how much error can be tolerated when generating a network automatically. In some cases, algorithms that mine e-mail and other communications work quite well, and can be used to save time by providing an overview of connections or filtering information.

Automatically inferring the nature of social connections may be useful for prioritizing messages or establishing privacy settings that a user could then approve. However, “you don’t want to overinfer or get so fine-grained that it’s creepy,” Gilbert cautions.

All the researchers agree that allowing users to clean up any errors introduced by the algorithms is crucial to progress. “You always have the option of bringing in the human element,” says Adamic. “You could always take a step where the algorithm is 95 percent accurate and you let individuals handle the last 5 percent.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.