Skip to Content

The future of social networks might be audio

The latest apps bring back the intimacy of the spoken word—but there are serious moderation issues to be addressed.
January 25, 2021
image of mint green headphones laying on split pink and mint backgroundimage of mint green headphones laying on split pink and mint background
Ms Tech | Pexels

Every morning, as Nandita Mohan sifts through her emails, her college pals are in her ear—recounting their day, reminiscing, reflecting on what it’s like to have graduated in the throes of a pandemic.

Mohan, a 23-year-old software programmer in the Bay Area, isn’t on the phone, nor is she listening to an especially personal podcast; she’s using Cappuccino, an app that takes voice recordings from a closed group of friends or family and delivers them as downloadable audio.

“Just hearing all of us makes me value our friendship, and hearing their voices is a game-changer,” she says. 

Audio messaging has been available for years; voice memos on WhatsApp are especially big in India, and WeChat audio messages are popular in China. And during the pandemic, these features have become an easy way for people to stay in touch while bypassing Zoom fatigue. But now a new wave of hip apps are baking the immediacy and rawness of audio into the core experience, making voice the way people connect again. From phone calls to messaging and back to audio—the way we use our phones may be coming full circle.

The newcomers

The best-known audio-focused network is Clubhouse, the buzzy, invite-only app that debuted last spring to glowing reviews for its talk-show-like twist on the chat rooms of the early internet. Using it is akin to dropping in on an (online) party conversation.

But Clubhouse’s promise was shattered by its lack of moderation and the unfettered chatter of misogynistic venture capitalists. New York Times reporter Taylor Lorenz, once a fan of the app, was subject to harassment in Clubhouse sessions for calling out one VC’s behavior.

“I don’t plan on opening the app again,” Lorenz told Wired. “I don’t want to support any network that doesn’t take user safety seriously.” Her experience wasn’t a one-off, and since then darker, racist elements have appeared. It seems the behavior that mars every other social platform also lurks beneath Clubhouse’s exclusive, cool veneer.

Gaming chat app Discord, meanwhile, has exploded in popularity. The service uses voice-over-IP software to translate spoken chat into text (an idea that came from video gamers who found typing while playing impossible). In June, to tap into people’s need for connection during the pandemic, Discord announced a new slogan—“Your place to talk”—and began trying to make the service appear less gamer-centric. The marketing push seems to have worked: by October, Discord estimated 6.7 million users—up from 1.4 million in February, just before the pandemic hit.

But while Discord’s communities, or “servers,” can be as small and innocent as kids organizing remote sleepovers, they have also included far-right extremists who used the service to organize the Charlottesville white supremacist rally and the recent insurrection at the US Capitol.

In both Discord and Clubhouse, the in-group culture—nerdy gamers in Discord’s case, overconfident venture capitalists for Clubhouse—have led to instances of groupthink that can be off-putting at best and bigoted at worst. Yet there’s undeniably an appeal. Isn’t it cool to talk and literally be heard? After all, that’s the foundational promise of social media: democratization of voice.

Speak and you shall be heard

The intimacy of voice makes audio social media that much more appealing in the age of social distancing and isolation. Jimi Tele, the CEO of Chekmate, a “text-free” dating app that connects users through voice and video, says he wanted to launch an app that would be “catfish-proof,” referring to the practice of deceiving others online with fake profiles.

“We wanted to break away from the anonymity and gamification that texting allows and instead create a community rooted in authenticity, where users are encouraged to be themselves without judgment,” Tele says. The app’s users start voice memos that average five seconds and then get progressively longer. And while Chekmate has a video option, Tele says the app’s several thousand users overwhelmingly favor using just their voices. “They are perceived as less intimidating [than video messages],” he says.

This immediacy and authenticity is the reason Gilles Poupardin created Cappuccino. He wondered why there wasn’t already a product that gathered voice memos together into a single downloadable file. “Everyone has a group chat with friends,” he says. “But what if you could hear your friends? That’s really powerful.”

Mohan agrees. She says that her group of friends switched to Cappuccino from a Facebook messenger chat group and then tried Zoom calls early in the pandemic. But the discussions would inevitably turn into a highlight reel of big events. “There was no time for details,” she laments. The daily Cappuccino “beans,” as the stitched-together recordings are called, let Mohan’s friend circle keep up to date in a very intimate way. “My one friend is moving to a new apartment in a new city, and she was just talking about how she goes to get coffee in her kitchen,” she says. “That’s something I would never know in a Zoom call, because it’s so small.”

Even legacy social media firms are getting in on the act. In the summer of 2020 Twitter launched voice tweets, allowing users to embed their voice right onto their timeline. And in December, it launched a feature called Spaces in beta for live, host-moderated audio conversations between two or more people.

“We were interested in whether audio could add an additional layer of connection to the public conversation,” says Rémy Bourgoin, a senior software engineer on Twitter’s voice tweets and Spaces team.

Bourgoin says that the vision is for Spaces to be “as intimate and comfortable as attending a well-hosted dinner party.” He adds, “You don’t need to know everyone there to have a good time, but you should feel comfortable sitting at the table.”

You may have snorted in disbelief reading that Twitter wants to create a space that is “comfortable” and “intimate.” After all, Twitter doesn’t exactly have a stellar track record in creating an online environment that is welcoming and protects vulnerable users from abuse. 

Bourgoin says the group is moving slowly on purpose before releasing Spaces beyond beta and a small group of users, even going so far as to include captioning—a rare accessibility feature in audio networks. “Right now, Spaces can be reported by anyone who is in the space,” Bourgoin says. “Reports will be reviewed by our team, who will evaluate for violations of the Twitter rules.”

The ugliness

Ah, moderation. Content moderation in audio is far more difficult than it is in text. Searchable text and automoderators have been used with some success, but human moderators seem to be the most effective way to block people who don’t abide by community rules—which puts human beings at risk. For platforms where people can jump in at any time and chat, the very democratization that makes audio attractive creates a nightmare in moderation. “That’s definitely a huge challenge with any user-generated platform,” says Austin Petersmith, who launched Capiche.fm in beta last year. The site grew out of a software community that is a bit like a call-in radio show: hosts call each other to start the show, and invite listeners to chime in while they’re “on-air.” 

As users of Clubhouse have learned, voice-only spaces can quickly get ugly, just like anywhere else on the internet. People who already suffer from abuse online—those who are marginalized, female or nonbinary, nonwhite, and/or younger—are unlikely to want to make the leap to a place where they can be abused in a different, harder-to-police format. 

There’s also reason to believe these newer, less regulated platforms will be attractive to disaffected, far-right conspiracy-minded extremists and QAnon believers, who are now creating their own podcast networks.

Still, audio social networks seem to offer something that traditional social media cannot. One of the format’s main benefits is the way it gives users the immediate connection of a voice or video call, but on their own terms. Phone calls—and Zoom calls, for that matter—require some planning. But audio social media content can be created and digested at your own convenience in a way that news alerts, notifications, and doomscrolling don’t allow. As Mohan, who listens to her friends every morning, says of Cappuccino: “It engages me and forces me to listen more carefully as each person is talking. I even take notes of things I want to respond to and say.”

For Mohan, playing the recordings from her circle of five friends has become a beloved ritual, allowing her to catch up with them at her own pace. “Every day, in the middle of my work day, I’ll record my Cappuccino,” she says, referring to the recording she makes on the app. “It feels really personal. I’m hearing all their voices and I feel on top of what they [her friends] are doing in their day to day.”

Editor’s note: A previous version of this story equated voice tweets with Twitter Spaces and said they launched at the same time. We regret the error.