Study Reveals Techie Terms Censored Online in China

Researchers reverse-engineered a list of keywords blacklisted on various messaging platforms.

Aviva Hope Rutkinarchive page

July 29, 2013

China’s surveillance of its citizens’ digital activities is common knowledge. However, questions remain concerning what content is targeted by government censors and how these blacklists change in response to current events.

**Party line:** A screenshot of Sina Weibo shows messages posted by a member of the Jiu San Society, a Chinese political party.

A new study released this month in First Monday uncovers more than 4,000 unique keywords censored over the last year and a half on Chinese instant messaging platforms. Focusing on Skype and the microblogging service Sina Weibo, the researchers cultivated their keyword list using reverse-engineering techniques such as packet sniffing, which captures and analyzes packets of data as they pass through a network.

More than 20% of the terms targeted on Sina Weibo, it turns out, related to technology, including specific URLs, spyware, and technical terms. Some of these keywords, such as “Chinese language Wikipedia” and “Google Blogger,” referred to popular websites dedicated to the open dissemination of information. (Censorship of Chinese Wikipedia was first spotted nearly ten years ago on the anniversary of the Tianmen Square protests.)

Generic terms like “system,” “administrator,” and “system notification” also appeared on the keyword list. The researchers hypothesize that these more general keywords might be used to catch anyone attempting to impersonate a Sina Weibo administrator and thus wield power over other users’ accounts.

Other common words targeted by censors include “internet,” “chat,” “world wide web,” and “Chinese person.” Additionally, the researchers found that the censor’s keyword lists fluctuated in response to major events. For example, after the Arab Spring began in late 2010, and some began calling for similar protests (or Jasmine Rallies) in China, dozens of related keywords were added to the censorship lists. 69 of these keywords were then abruptly removed for several weeks in May 2011, which the researchers interpreted as a possible attempt to monitor protester mobilization.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.