Twitter wants to reduce the “health” of its conversations to four numbers. Good luck, say experts.

What kind of thermometer do you need to take a social network’s temperature?

Rachel Metzarchive page

March 8, 2018

Flickr

Twitter needs a checkup. After years of failing to stomp out trolls, hostility, hoaxes, and other ills, the company is asking for proposals to measure the “health” of conversations people have on its platform. The goal is to create metrics for the quality of discourse between Twitter users—and to give engineers data on how to make Twitter a nicer place to hang out.

What defines a “healthy” conversation? To get people thinking, the company pointed to four principles of public-sphere health developed by Cortico, a research nonprofit working with MIT’s Media Lab. These principles include things like whether the people discussing an issue are using the same facts and how open they are to listening to others’ opinions. (Deb Roy, Cortico’s cofounder and an associate professor at MIT, says the organization doesn’t have a formal agreement with Twitter but may propose metrics of its own.)

One immediate issue is that Cortico’s principles are based on research into US-based Twitter users. Yet while the US has more Twitter users than any other country, they make up only about one-fifth of the world’s total. So if Twitter is serious about taking the pulse of its global network, it may need much more than Cortico’s four metrics.

“It’s probably the case that there are some fundamentals that are universal, but then there may be other aspects that are not, and whether the right unit of segmentation is a country or not is unclear,” Roy says.

Then there’s the problem of homophones, words with completely different meanings in different contexts. That’s likely to be an especially acute problem for Twitter, since there’s not much room for context in a tweet. For the same reason, it’s hard to detect sarcasm or irony.

And even given consensus about a word’s definition, its emotional impact can vary among different speakers of a language. Jennifer Golbeck, who runs the University of Maryland’s Social Intelligence Lab, points to the word “cunt” as an example: when she was researching online harassment, she realized that it was less offensive in the UK than the US. “You can imagine how difficult it’s going to be when you get really different cultures,” she says.

So where should Twitter—or anyone hoping to help—look for guidance on how to measure conversational health? While no one has a tried-and-true diagnostic test, some people I spoke with had a few ideas.

Anatoliy Gruzd, director of the Social Media Lab at Ryerson University in Ontario, has been studying the news-sharing site Reddit. He suggests Twitter take a look at its communities, known as subreddits, where human moderators manage the conversations and the community develops guidelines for what’s appropriate within the group. Once you establish these norms, he says, any metrics you come up with are more useful because you know their context.

Karen Kovacs North, a professor of social media at the University of Southern California, thinks that whatever measures Twitter adopts will be tweaked and customized within certain communities simply because they’ll fit some cultures and be out of sync with others.

And Golbeck suspects that certain criteria, like how balanced or emotionally charged a conversation is, will carry over from one culture to another. But she believes there will need to be different standards of measurement for different cultures and for cross-cultural conversations.

It’s going to be hard, she says, but making Twitter a healthier place for users should help its bottom line, too, since it depends on advertisers for revenue. “If it’s full of vitriol and racism and Nazis, it’s less likely that you’re going to advertise there,” she says.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.