Skip to Content

Undercover Researchers Expose Chinese Internet Water Army

An undercover team of computer scientists reveals the practices of people who are paid to post on websites.
In China, paid posters are known as the Internet Water Army because they are ready and willing to ‘flood’ the internet for whoever is willing to pay. The flood can consist of comments, gossip and information (or disinformation) and there seems to be plenty of demand for this army’s services.
This is an insidious tide. Positive recommendations can make a huge difference to a product’s sales but can equally drive a competitor out of the market. When companies spend millions launching new goods and services, it’s easy to understand why they might want to use every tool at their disposal to achieve success.
The loser in all this is the consumer who is conned into making a purchase decision based on false premises. And for the moment, consumers have little legal redress or even ways to spot the practice.
Today, Cheng Chen at the University of Victoria in Canada and a few pals describe how Cheng worked undercover as a paid poster on Chinese websites to understand how the Internet Water Army works. He and his friends then used what he learnt to create software that can spot paid posters automatically.
Paid posting is a well-managed activity involving thousands of individuals and tens of thousands of different online IDs. The posters are usually given a task to register on a website and then to start generating content in the form of posts, articles, links to websites and videos, even carrying out Q&A sessions.
Often, this content is pre-prepared or the posters receive detailed instructions on the type of things they can say. And there is even a quality control team who check that the posts meet a certain ‘quality’ threshold. A post would not be validated if it is deleted by the host or was composed of garbled words, for example.
Having worked undercover to find out how the system worked, Cheng and co then studied the pattern of posts that appeared on a couple of big Chinese websites: and In particular, they studied the comments on several news stories about two companies that they suspected of paying posters and who were involved in a public spat over each other’s services.
The Sina dataset consisted of over 500 users making more than 20,000 comments; the Sohu dataset involved over 200 users and more than 1000 comments.
Cheng and co went through all the posts manually identifying those they believed were from paid posters and then set about looking for patterns in their behaviour that can differentiate them from legitimate users. (Just how accurate were there initial impressions is a potential problem, they admit, but the same one that spam filters also have to deal with.)
They discovered that paid posters tend to post more new comments than replies to other comments. They also post more often with 50 per cent of them posting every 2.5 minutes on average. They also move on from a discussion more quickly than legitimate users, discarding their IDs and never using them again.
What’s more, the content they post is measurably different. These workers are paid by the volume and so often take shortcuts, cutting and pasting the same content many times. This would normally invalidate their posts but only if it is spotted by the quality control team.
So Cheng and co built some software to look for repetitions and similarities in messages as well as the other behaviours they’d identified. They then tested it on the dataset they’d downloaded from Sina and Sohu and found it to be remarkably good, with an accuracy of 88 per cent in spotting paid posters. “Our test results with real-world datasets show a very
promising performance,” they say.
That’s an impressive piece of work and a good first step towards combating this problem, although they’ll need to test it on a much wider range of datasets. Nevertheless, these guys have the basis of a software package that will weed out a significant fraction of paid posters, provided these people conform to the stereotype that Cheng and co have measured.
And therein lies the rub. As soon as the first version of the software hits the market, paid posters will learn to modify their behaviour in a way that games the system. What Cheng and co have started is a cat and mouse game just like those that plague the antivirus and spam filtering industries.
And that means, the battle ahead with the Internet Water Army will be long and hard.
Ref: Battling the Internet Water Army: Detection of Hidden Paid Posters

Keep Reading

Most Popular

DeepMind’s cofounder: Generative AI is just a phase. What’s next is interactive AI.

“This is a profound moment in the history of technology,” says Mustafa Suleyman.

What to know about this autumn’s covid vaccines

New variants will pose a challenge, but early signs suggest the shots will still boost antibody responses.

Human-plus-AI solutions mitigate security threats

With the right human oversight, emerging technologies like artificial intelligence can help keep business and customer data secure

Next slide, please: A brief history of the corporate presentation

From million-dollar slide shows to Steve Jobs’s introduction of the iPhone, a bit of show business never hurt plain old business.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.