Computer Scientists Measure the Speed of Censorship On China’s Twitter

Censorship on Weibo, China’s version of Twitter, is near real-time and relies on a workforce of over 4,000 censors who stop work during the evening news, according the first detailed analysis of censorship patterns.

Emerging Technology from the arXivarchive page

March 6, 2013

The Chinese version of Twitter is a microblogging service called Weibo which launched in 2010. This allows users to post 140 character messages with @usernames and #hashtags, just like Twitter– although 140 characters in Chinese contain significantly more information content than in English.

In just three years, Weibo has picked up some 300 million users who between them send 100 million messages each day at the rate of 70,000 per minute. That makes the inevitable process of censorship a tricky task for the Chinese authorities. So an interesting question is how they do it.

Today,Dan Wallach at Rice University in Houston, Texas, and a few pals reveal the results of a detailed study of censorship on Weibo. Their method has allowed them to reconstruct the censorship techniques used by the government, to calculate the number of workers who must be involved and even to discover their daily work schedules.

The work is possible because at least some of the content on Weibo is not censored prior to publication, only afterwards. Their approach was to collect posts from a set of users once every minute. They then tracked these posts to see which ones later became unavailable.

Of course, it’s not feasible to track everyone on Weibo so Wallach and co spent some time looking for users who seemed to have posts deleted more often than others, assuming that these users would be more likely to be censored in the future. Using this manual technique, they ended up observing some 3500 users over a period of 15 days last year who between them experienced around 4500 deletions per day, or about 12 per cent of the total.

Not all deletions are the result of censorship, however, since a user can delete his or her own posts. Wallach and co say that through their own trial and error they observed two types of deletion which return different messages. When users delete their own messages, a query for the post returns a “post does not exist” error message.

However, when a post is deleted by the censors, Weibo returns a different message saying: “permission denied”. It is these second type of deletions that Wallach and co concentrated on.

The results of their study are fascinating. They say that in their data set about 5 per cent of the deletions occur within 8 minutes of posting and around 30 per cent within 0 minutes. In total, 90 per cent of deletions occur within a day, although at times deletions can occur several days later.

Those are impressive numbers given the popularity of the microblogging service. How does Weibo manages this task?

Wallach and co say their data point to a number hypotheses about what’s going on. Since the highest volume of deletions occur within 5-10 minutes of posting, Weibo must be censoring them in near real time. If an average censor can scan around 50 posts a minute, that would require some 1400 censors at any instant to handle the 70,000 posts pouring in. And if they work 8 hour shifts, that’s a total of 4200 censors on the payroll each day.

Even then, this work force must have some technological help. Wallach and co say the data suggests Weibo has a number of techniques in operation. The first is keyword alerting. When a keyword appears, the post is immediately flagged for censors.

However, this is no mean feat since the Chinese language is notoriously hard to filter in this way because of the complexity of its alphabet and because of the neologisms and shortened language that is used on Weibo.

Wallach and co say that the authorities also target users who have a history of deletions, presumably assuming that they are more likely to post forbidden content in future (just as Wallach and co did).

It turns out that these users tend to be censored more quickly than others on the network. “Userswith larger deletion frequencies tend to observe fastercensorship of their work,” they say.

Wallach and co have also examined the rate of deletions throughout the 24 hour cycle finding that the censors are less active at night, when presumably fewer are working. They also face a backlog each morning. “They catch up by late morning or early afternoon,” conclude Wallach and co.

There is even a slight dip in the censorship rate at 7pm when the national evening news is on television.

Wallach and co are also interested in the type of posts that are censored and have examined the content of these for clues. They saythat topics commonly deleted include phrases such as “support Syrian rebels”, “Lying of gov. (Jixiang)”, “One-Child policy abuse” and “group sex”.

The topics that trigger mass removal the fastestare those that combine events that are hot topics in Weiboas a whole, such as “sex scandal”, with themes common to sensitive posts, such as government or policeman.

That’s a fascinating study that provides a rare but illuminating insight into the nature of Chinese censorship.

One question that this study does not address is why the authorities allow uncensored Weibo posts to appear in public at all. Given the formidable censorship machine in operation, why not block publication of all posts for 30 minutes or so, until the censorship is largely complete?

Wallach and co seem to suggest that this is possible. They say that on 1 August 2012, they tried to post a message including the phrase “Secretary of the Political and Legislative Committee.” “When we submit a post withthis character string in it, a warning message says”Sorry, since this content violates ‘Sina Weibo regulation rules’ or a related regulation or policy, thisoperation cannot be processed. If you need help,please contact customer service.”

So clearly some posts are blocked before they even reach public view.

Whatever the reason, clearly more work is needed. Wallach and co say they have several goals for the the future such as attempting to find out more about the way Weibo prioritises content for deletion. All that will depend on the team’s access to data and on the assumption that the authorities won’t be able to track down and stop the team’s accounts and the Tor network links they use to send the data out of the country. Brave work!

Ref:arxiv.org/abs/1303.0597: The Velocity of Censorship: High-Fidelity Detection of Microblog PostDeletions

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.