Probing the Dark Side of Google’s Ad-Targeting System

Researchers say Google’s ad-targeting system sometimes makes troubling decisions based on data about gender and other personal characteristics.

Tom Simonitearchive page

July 6, 2015

That Google and other companies track our movements around the Web to target us with ads is well known. How exactly that information gets used is not—but a research paper presented last week suggests that some of the algorithmic judgments that emerge from Google’s ad system could strike many people as unsavory.

Researchers from Carnegie Mellon University and the International Computer Science Institute built a tool called AdFisher to probe the targeting of ads served up by Google on third-party websites. They found that fake Web users believed by Google to be male job seekers were much more likely than equivalent female job seekers to be shown a pair of ads for high-paying executive jobs when they later visited a news website.

AdFisher also showed that a Google transparency tool called “ads settings,” which lets you view and edit the “interests” the company has inferred for you, does not always reflect potentially sensitive information being used to target you. Browsing sites aimed at people with substance abuse problems, for example, triggered a rash of ads for rehab programs, but there was no change to Google’s transparency page.

What exactly caused those specific patterns is unclear, because Google’s ad-serving system is very complex. Google uses its data to target ads, but ad buyers can make some decisions about demographics of interest and can also use their own data sources on people’s online activity to do additional targeting for certain kinds of ads. Nor do the examples breach any specific privacy rules—although Google policy forbids targeting on the basis of “health conditions.” Still, says Anupam Datta, an associate professor at Carnegie Mellon University who helped develop AdFisher, they show the need for tools that uncover how online ad companies differentiate between people.

“I think our findings suggest that there are parts of the ad ecosystem where kinds of discrimination are beginning to emerge and there is a lack of transparency,” says Datta. “This is concerning from a societal standpoint.” Ad systems like Google’s influence the information people are exposed to and potentially even the decisions they make, so understanding how those systems use data about us is important, he says.

Even companies that run online ad networks don’t have a good idea of what inferences their systems draw about people and how those inferences are used, says Datta. His group has begun collaborating with Microsoft to develop a version of AdFisher for use inside the company, to look for potentially worrying patterns in the ad targeting on the Bing search engine. A paper by Datta and two colleagues—Michael Tschantz, of the International Computer Science Institute, and Amit Datta, also at Carnegie Mellon—was presented at the Privacy Enhancing Technologies Symposium in Philadelphia last Thursday.

Google did not officially respond when the researchers contacted the company about their findings late last year, they say. However, this June the team noticed that Google had added a disclaimer to its ad settings page. The interest categories shown are now said to control only “some of the Google ads that you see,” and not those where third parties have made use of their own data. Datta says that greatly limits the usefulness of Google’s transparency tool, which could probably be made to reveal such information if the company chose. “They are serving these ads, and if they wanted to they could reflect these interests,” he says.

“Advertisers can choose to target the audience they want to reach, and we have policies that guide the type of interest-based ads that are allowed,” said Andrea Faville, a Google spokeswoman, in an e-mail. “We provide transparency to users with ‘Why This Ad’ notices and Ads Settings, as well as the ability to opt out of interest-based ads.” Google is looking at the methodology of the study to try to understand its findings.

The AdFisher tool works by sending out hundreds or thousands of automated Web browsers on carefully chosen trails across the Web in such a way that an ad-targeting network will infer certain interests or activities. The software then records which ads are shown when each automated browser visits a news website that uses Google’s ad network, as well as any changes to the ad settings page. In some experiments that page is edited to look for differences between the ways ads are targeted to, say, males and females. AdFisher automatically flags any statistically significant differences in how ads are targeted using the particular interest categories or demographics it is investigating.

Roxana Geambasu, an assistant professor at Columbia University, says there’s considerable value in the way AdFisher can statistically extract patterns from the complexity of targeted ads. A tool called XRay, which her own research group released last year, can reverse-engineer the connection between ads shown to Gmail users and keywords in their messages. For example, ads for low-requirement car loans might be targeted to those using words associated with financial difficulties.

However, Geambasu says that the results from both XRay and AdFisher are still only suggestive. “You can’t draw big conclusions, because we haven’t studied this very much and these examples could be rare exceptions,” she says. “What we need now is infrastructure and tools to study these systems at much larger scale.” Being able to watch how algorithms target and track people to do things like serve ads or tweak the price of insurance and other products is likely to be vital if civil rights groups and regulators are to keep pace with developments in how companies use data, she says.

A White House report on the impact of “big data” last year came to similar conclusions. “Data analytics have the potential to eclipse longstanding civil rights protections in how personal information is used in housing, credit, employment, health, education, and the marketplace,” it said.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.