Scroll through the livestreaming videos at 4 a.m. on Taobao, China’s most popular e-commerce platform, and you’ll find it weirdly busy. While most people are fast asleep, there are still many diligent streamers presenting products to the cameras and offering discounts in the wee hours.
But if you take a closer look, you may notice that many of these livestream influencers seem slightly robotic. The movement of their lips largely matches what they are saying, but there are always moments when it looks unnatural.
These streamers are not real: they are AI-generated clones of the real streamers. As technologies that create realistic avatars, voices, and movements get more sophisticated and affordable, the popularity of these deepfakes has exploded across China’s e-commerce streaming platforms.
Today, livestreaming is the dominant marketing channel for traditional and digital brands in China. Influencers on Taobao, Douyin, Kuaishou, or other platforms can broker massive deals in a few hours. The top names can sell more than a billion dollars’ worth of goods in one night and gain royalty status just like big movie stars. But at the same time, training livestream hosts, retaining them, and figuring out the technical details of broadcasting comes with a significant cost for smaller brands. It’s much cheaper to automate the job.
Since 2022, a swarm of Chinese startups and major tech companies have been offering the service of creating deepfake avatars for e-commerce livestreaming. With just a few minutes of sample video and $1,000 in costs, brands can clone a human streamer to work 24/7.
From deepfake to e-commerce
Synthetic media have been making headlines since the late 2010s, particularly when a Reddit user named “deepfake” swapped faces into pornography. Since then, the technology has evolved, but the idea is the same: with some technical tools, faces can be generated or manipulated to look like specific real humans and do things that the actual human has never done.
The technology has mostly been known for its problematic use in revenge porn, identity scams, and political misinformation. While there have been attempts to commercialize it in more innocuous ways, it has always remained a novelty. But now, Chinese AI companies have found a new use case that seems to be going quite well.
Founded in 2017, Nanjing-based startup Silicon Intelligence specializes in natural-language processing, particularly text-to-speech technologies like robocall tools. But Sima Huapeng, its founder and CEO, says his company first started to see AI’s potential as a livestreaming tool in 2020.
Back then, Silicon Intelligence needed 30 minutes of training videos to generate a digital clone that could speak and act like a human. The next year, it was 10 minutes, then three, and now only one minute of video is needed.
And as the tech has improved, the service has gotten cheaper too. Generating a basic AI clone now costs a customer about 8,000 RMB ($1,100). If the client wants to create a more complicated and capable streamer, the price can go up to several thousands of dollars. Other than the generation, that fee also covers a year of maintenance.
Once the avatar is generated, its mouth and body move in time with the scripted audio. While the scripts were once pre-written by humans, companies are now using large language models to generate them too.
Now, all the human workers have to do is input basic information such as the name and price of the product being sold, proofread the generated script, and watch the digital influencer go live. A more advanced version of the technology can spot live comments and find matching answers in its database to answer in real time, so it looks as if the AI streamer is actively communicating with the audience. It can even adjust its marketing strategy based on the number of viewers, Sima says.
These livestream AI clones are trained on the common scripts and gestures seen in e-commerce videos, says Huang Wei, the director of virtual influencer livestreaming business at the Chinese AI company Xiaoice. The company has a database of nearly a hundred pre-designed movements.
“For example, [when human streamers say] ‘Welcome to my livestream channel. Move your fingers and hit the follow button,’ they are definitely pointing their finger upward, because that’s where the ‘Follow’ button is on the screen of most mobile livestream apps,” says Huang. Similarly, when streamers introduce a new product, they point down—to the shopping cart, where viewers can find all products. Xiaoice’s AI streamers replicate all these common tricks. “We want to make sure the spoken language and the body language are matching. You don’t want it to be talking about the Follow button while it’s clapping its hands. That would look weird,” she says.
Spun off from Microsoft Software Technology Center Asia in 2020, Xiaoice has always been focused on creating more human-like AI, particularly avatars that are capable of showing emotions. “Traditional e-commerce sites just feel like a shelf of goods to most customers. It’s cold. In livestreaming, there is more emotional connection between the host and the viewers, and they can introduce the products better,” Huang says.
After piloting with a few clients last year, Xiaoice officially launched its service of generating under-$1,000 digital clones this year; like Silicon Intelligence, Xiaoice only needs human streamers to provide a one-minute video of themselves.
And like its competitors, Xiaoice clients can spend more to fine-tune the details. For example, Liu Jianhong, a Chinese sports announcer, made an exquisite clone of himself during the 2022 FIFA World Cup to read out the match results and other relevant news on Douyin.
A cheap replacement for human streamers
These generated streamers won’t be able to beat the star e-commerce influencers, Huang says, but they are good enough to replace mid-tier ones. Human creators, including those who used their videos to train their AI clones, are already feeling the squeeze from their digital rivals to some extent. It’s harder to get a job as an e-commerce livestream host this year, and the average salary for livestream hosts in China went down 20% compared to 2022, according to the analytics firm iiMedia Research.
But the potential for companies to complement human work by keeping the livestream going during the hours when fewer people are watching means it’s hard to justify the cost of hiring real streamers.
That’s already happening. In the post-midnight hours, many of the streaming channels on popular e-commerce platforms like Taobao and JD feature these AI-generated streamers.
Previous examples have shown that deepfake technologies don’t need to be perfect to deceive viewers. In 2020, a scammer posed as a famous Chinese actor with the aid of crude face-swapping tools and still managed to get thousands of dollars from unsuspecting women who fell in love with his videos.
“If a company hires 10 livestream hosts, their skill levels are going to vary. Maybe two or three streamers at the top would contribute to 70% to 80% of the total sales,” says Chen Dan, the CEO of Quantum Planet AI, a company that packages technologies like Xiaoice’s and sells them to corporate clients. “A virtual livestream host can replace the rest—six or seven streamers that contribute less and have lower ROI [return on investment] rates. And the costs would come down significantly.”
Chen says he has witnessed a lot more interest from brands in AI streamers this year, partly because everyone is looking to “降本增效”—lower costs and improve efficiency, the new buzzword among Chinese tech companies as the domestic economy slows down.
Chen has over 100 clients using Xiaoice’s service now, and these virtual streamers have brokered millions of dollars in sales. One Xiaoice streamer brought in over 10,000 RMB ($1,370) in revenue in just one hour.
There are still drawbacks, he says. For example, many of his clients are furniture brands, and although the AI is clever enough to speak and use gestures, it can’t really sit on a sofa or lie in a bed, so the streams lack the appeal of real users testing the products.
Besides smaller startups like Silicon Intelligence and Xiaoice, major tech players are testing out AI-generated livestreams. Alibaba, Tencent, Baidu, and JD all launched some variations of the same services this year, allowing brands on their platforms to generate their own AI streamers.
Marketing companies that employ large numbers of human streamers have also noticed the trend. Foshan Yowant Technology, one of the top livestream marketing agencies, has announced a strategic collaboration with Xiaoice; Silicon Intelligence has also set up a joint venture with the company behind Viya, China’s former “livestream queen.”
The rising popularity of AI-generated livestreams has also caught the attention of video platforms like Douyin, the Chinese version of TikTok, as well—though it’s taking a different approach than other tech giants. It’s seemingly more concerned with transparency and it said in a May document that all videos generated by AI should be labeled clearly as such on the platform, and that virtual influencers need to be operated by real humans. The platform has always banned the use of recorded videos as livestreams. AI-generated livestreaming, with no recorded footage but also little real-time human input, straddles the line on that rule.
The Chinese government made several laws in the past two years on synthetic media and generative AI that would apply to the use in e-commerce streaming. But the effects of government and platform regulations remain to be seen, because the technology is still too new to have met serious enforcement.
For Silicon Intelligence, its next step is to add “emotional intelligence” to the AI streamers, Sima says: “If there are abusive comments, it will be sad; if the products are selling well, it will be happy.” The company is also working on making AI streamers interact and learn from each other.
The company has had a fascinating and sort of terrifying goal since its beginning: it wants to create “100,000,000 silicon-based laborers” by 2025. For now, Sima says, the company has generated 400,000 virtual streamers. There’s still a long way to go.
Correction: The story has been updated with the correct name of the analytics firm iiMedia Research.
DeepMind’s cofounder: Generative AI is just a phase. What’s next is interactive AI.
“This is a profound moment in the history of technology,” says Mustafa Suleyman.
AI hype is built on high test scores. Those tests are flawed.
With hopes and fears about the technology running wild, it's time to agree on what it can and can't do.
You need to talk to your kid about AI. Here are 6 things you should say.
As children start back at school this week, it’s not just ChatGPT you need to be thinking about.
AI language models are rife with different political biases
New research explains you’ll get more right- or left-wing answers, depending on which AI model you ask.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.