Hidden Industry Dupes Social Media Users

A trawl of Chinese crowdsourcing websites—where people can earn a few pennies for small jobs such as labeling images—has uncovered a multimillion-dollar industry that pays hundreds of thousands of people to distort interactions in social networks and to post spam.

The report’s authors, at the University of California, Santa Barbara, also found evidence that crowdsourcing sites in the U.S. are similarly dominated by ethically questionable jobs. They conclude that the rapid growth of this way of making money will make paid shills a serious security problem for websites and those who use them around the world. A paper describing their results is available on the Arxiv pre-print server.

Ben Zhao, an associate professor of computer science at UCSB (and a TR35 winner in 2006), started looking into the largely uncharted crowdsourcing industry in China after working closely with RenRen, a social network that is sometimes called the “Facebook of China,” to track malicious activity on the site. Zhao was intrigued to see a lot of relatively sophisticated attempts to send spam and promote brands by users that appeared to be working with specific agendas.

When he and colleagues investigated the source of that activity, the team was surprised by what it found, says Zhao: “Evil crowdsourcing on a very large scale.” Influencing public opinion with fake “grassroots” activity is known as astroturfing, leading Zhao to coin the term “crowdturfing,” since it is done via large crowdsourcing sites.

The researchers discovered that a large amount of the suspect activity in China originated from two crowdsourcing sites: Zhubajie, the largest in China, and Sandaha. There, people are openly offered the equivalent of tens of cents to do things like create accounts on particular sites, post biased answers about specific products on Q&A sites, and create and spread positive messages about products on social networks.

“The websites are very public, and you can see who offered past jobs and what they paid,” says Zhao. His team used software to show that Zhubajie and Sandaha are, respectively, 88 and 92 percent crowdturfing. They also found that Zhubajie currently processes over a million dollars every month for crowdturfing tasks; the figure for the younger Sandaha is tens of thousands of dollars. “This industry is millions of dollars per year already and [shows] roughly exponential growth,” says Zhao. “I think we’re still in the early stages of this phenomenon.”

Spooked by the scale of activity on the Chinese sites, and the potential for them to be used to compromise U.S. sites, the UCSB team examined U.S.-based crowdsourcing sites. Amazon’s Mechanical Turk may be the best known, but others have also sprung up.

“Most of those other sites have a lot of crowdturfing,” says Zhao, and the sites don’t actively shut down such tasks, as Amazon tries to do. ShortTask, the second-largest U.S. crowdsourcing site studied, was found to be 95 percent crowdturfing tasks, and helped workers get paid for over half a million astroturfing tasks in the last year. Despite Amazon’s efforts, Mechanical Turk was found to be 12 percent crowdturfing, a lower estimate than the 40 percent alleged by a study from New York University late last year.

Zhao says these sites will likely become the source of significant trouble for social networks like Twitter and Facebook, just as it has become for their Chinese equivalents.

“People are willing to do this for such small amounts, and we have seen that the results are very good,” he says. Zhao thinks that favorable economics will lead to crowdsourcing sites in China and other developing countries troubling U.S. services. ShortTask and other U.S. crowdsourcing sites with a high proportion of crowdturfing have many workers from developing countries.

“The worst thing is that this is so difficult to detect,” says Zhao. “All our security methods assume that there is a program at play, and that imposes constraints that you can detect.” Zhao’s group has previously worked to uncover spam inside Facebook, mostly a result of software bots gaining control of genuine user accounts. Facebook and other Web companies today rely on tools like Captchas or relatively simple rules able to easily spot automated accounts. “If you have a real human involved who is determined, then what you can do is really only limited by the price they are paid,” says Zhao.

Filippo Menczer, director of the Center for Complex Networks and Systems Research at the University of Indiana, is working to develop systems to detect political astroturfing on Twitter. “It’s already a hard thing to do, and probably it will get more difficult,” he says, especially as crowdsourcing services become easier to use.

Menczer’s group first built a system to detect political astroturfing in the run-up to the most recent midterm elections. It first identifies threads of political discussion circulating on Twitter, using hashtags, links, names, and sentences. Software trained to recognize both legitimate and astroturfing tweets then sifts fraudulent messages from that soup of political discussion, and even tracks their success in influencing real users.

That system was able to find automated accounts by sending carefully varying messages promoting certain political sites. But Menczer has always suspected they were missing an unknown amount of more subtle astroturfing campaigns. Looking at the origin of crowdsourced astroturfing provides another perspective, he says.

“The fact that there are websites almost dedicated to making it easy to hire people to do this is further evidence that this is happening,” says Menczer, who is working to upgrade his astroturfing detection system to analyze discussion around next year’s presidential elections.

One possible way to tackle such networks would be to follow the money, says Zhao. That would likely uncover a less distributed target. A study earlier this year found that 95 percent of the income from spam e-mail passes through just three banks, a much easier target than the millions of compromised computers sending out the unsolicited messages or the shadowy criminals coordinating them.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.