Skip to Content

Search Engines’ Chinese Self-Censorship

U.S.-based search engines are choosing what they censor in China and could be blocking more than they have to.
June 30, 2008

To operate in China, search engine companies based in the United States have built products that cooperate with China’s policies of Internet censorship. That much has long been recognized. But a new analysis suggests that search companies, including Google, Microsoft, and Yahoo, are independently deciding what to censor and could be censoring more information than Chinese laws demand.

A report released last week by the Citizen Lab at the Munk Centre for International Studies at the University of Toronto found that different search engines are blocking fairly different content. “The low overlap means that companies are choosing the exact content to censor or, alternatively, to not censor,” says Nart Villeneuve, a senior research fellow at the Citizen Lab and the author of the report. “That doesn’t mean that they’re not getting guidance from the Chinese government in other ways,” he notes. But Villeneuve says that if search engines are interpreting Chinese policies to decide what to censor, that introduces the possibility that they may block more content than is strictly necessary.

The U.S. search engine companies say that they cooperate with Chinese censorship policies because some presence in the country is better than none at all. “We started Google.cn because Google.com was strangled in China,” says Robert Boorstin, director of policy communications for Google. Though Boorstin says he can’t comment on the specifics of Google’s Chinese product, he says, “We were faced with a very clear choice: we can either start a new public library where people could see 98 percent of the stuff in the stacks, or we would have no library at all, and nobody would get a library card.”

Indeed, according to Villeneuve’s report, even with the self-censoring, foreign search engines provide about 20 percent more content on average than Internet users in China would otherwise be able to access through domestic search engines. But, he says, “The bigger issue is just that we don’t know exactly what they’re doing, and the search engine companies haven’t been open publicly about what they’re doing.”

Villeneuve tested search engines made by Google, Microsoft, Yahoo, and Beijing-based Baidu. To make the testing fair, Villeneuve compared how the search engines handled specific keywords on specific sites–say, “Tiananmen Square” at news.bbc.co.uk. He also had to distinguish the effects of China’s “Great Firewall,” which blocks some information passing into or out of China, from the censorship of specific search engines. Since Yahoo and Baidu both host their Chinese products within China, their crawlers can index Chinese content without the firewall’s interference. Users outside China, however, will see their results filtered. With Google and Microsoft, the situation is reversed. So Villeneuve tested Yahoo and Baidu from within China and Google and Microsoft from outside, avoiding the firewall.

Villeneuve found 313 websites that were censored by at least one of the search engines during at least one of the tests he conducted. However, only 76 were censored at least once by all four, and, of those, only 8 were censored by all four each time he tested. Google had the lowest average number of censored sites, at 15.2 percent of those tested. Microsoft censored 15.7 percent of sites, followed by Yahoo at 20.8 percent and Baidu at 26.4 percent.

Another characteristic Villeneuve tested was transparency, meaning how clearly a search engine notifies a user that a result has been censored. He found that Google maintained the highest transparency, while Microsoft and Yahoo both had slightly less transparency than they did in 2006. Though Microsoft said in a statement that it is committed to providing notification, Villeneuve found that such notices occurred only with general keyword searches, not searches targeted to specific sites. He notes that Yahoo’s notifications, which accompany any search, whether its results have been censored or not, makes it difficult to determine which sites have been censored and which were simply not indexed.

“The question in China is made very complex in that there is a combination of tacit government guidance in what one should block, and there’s also a lot of guessing by the companies,” says Derek Bambauer, an assistant professor of law at Wayne State University. “These factors come together to lead to overblocking.” Bambauer says the Citizen Lab’s report is important because it provides rigorous methods that can be used to compare the companies’ censorship practices.

John Palfrey, executive director of the Berkman Center for Internet and Society, says, “Search companies are plainly conservative about making decisions of what to block because the law is so unclear.” He adds that the “biggest thing companies can do is to work together on a common front,” a situation that could potentially reduce overblocking.

Villeneuve says that he thinks it’s important to independently monitor the search companies so that there’s pressure for them to remain accountable for their commitments to minimizing censorship. Going forward, he says, he would like to see companies post clearer notices about what is being blocked and why, perhaps by citing which laws specifically are making content unavailable.

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

This baby with a head camera helped teach an AI how kids learn language

A neural network trained on the experiences of a single young child managed to learn one of the core components of language: how to match words to the objects they represent.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.