Software Identifies Social Cliques You Didn’t Even Know You Had

Led by a engineers at Microsoft research, a team of computer scientists has come up with what may be the most accurate method of identifying social cliques within social networks.

Their software (pdf) uses a totally novel approach built on game theory, in which every member of a social network is treated as a rational actor attempting to maximize their own utility in the face of the benefits and costs of maintaining membership in social cliques. (For those of you who are really into this stuff, they found the Nash equilibrium for the cost / benefit calculus of each individual.)

It’s well known that humans derive many benefits from belonging to a community, but maintaining those ties is costly in terms of time or other resources, for instance when a professional community has a membership fee.

This work could aid studies of urban development, criminal networks, marketing and many other areas of research enabled by the data readily available from online social networks such as Facebook and Twitter.

Intriguingly, two of the data sets the researchers tested their work on, which are apparently standard for this kind of research, were data gathered by anthropologists about a Karate academy, and data gathered by marine biologists about a pod of 64 dolphins. Applying their game-theoretic approach to both networks, they were able to resolve cliques that other approaches missed entirely.

In a world where the boundaries of cliques can be so fuzzy that even the individuals within them might not recognize they belong to a group, this work could some day help us to make explicit the social landscapes in which we are participating. Anyone who has had the experience of realizing that two friends whom you didn’t know were also friends with each other has experienced the real-world equivalent of the fruits of this clique-resolving algorithm.

With help from Zhenming Liu of Harvard and Xiaorui Sun of Shanghai Jiao Tong University, Wei Chen and Yajun Wang of Microsoft also successfully applied the algorithm to a common problem in academic citations: figuring out who is who among the many Chinese researchers whose names are spelled the same when romanized. For example, there are more than 20 people named Wei Chen in the DBLP computer science bibliography. (Not coincidentally, one of them is an author of this paper.)

Using a map of 20,000 nodes from the bibliography, where each node is a person, they discovered all the communities to which authors with that name belonged. Knowing how unlikely it is that any two Wei Chens would belong to the exact same set of cliques, they were able to disambiguate the various Wei Chens in the DBLP.

Follow Mims on Twitter or contact him via email.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.