Skip to Content
Artificial intelligence

How to Fix Silicon Valley’s Sexist Algorithms

Computers are inheriting gender bias implanted in language data sets—and not everyone thinks we should correct it.
November 23, 2016

The presidential campaign made clear that chauvinist attitudes toward women remain stubbornly fixed in some parts of society. It turns out we’re inadvertently teaching artificial-intelligence systems to be sexist, too.

New research shows that subtle gender bias is entrenched in the data sets used to teach language skills to AI programs. As these systems become more capable and widespread, their sexist point of view could have negative consequences—in job searches, for instance.

The problem results from the way machines are being taught to read and talk. Computer scientists are feeding them huge quantities of written or spoken language, and letting them draw connections between words and phrases.

The resulting data sets, known as word embeddings, are widely used to train AI systems that handle language—including chatbots, translation systems, image-captioning programs, and recommendation algorithms. Word embeddings represent the relationships between words as mathematical values. This makes it possible for a machine to perceive semantic connections between, say, “king” and “queen” and understand that the relationship between the two words is similar to that between “man” and “woman.” But researchers from Boston University and Microsoft Research New England also found that the data sets considered the word “programmer” closer to the word “man” than “woman,” and that the most similar word for “woman” is “homemaker.”

James Zou, an assistant professor at Stanford University who conducted the research while at Microsoft, says this could have a range of unintended consequences. “We are still trying to understand the full impact that comes from the many AI systems using these biased embeddings,” Zou says.

Zou and colleagues have conducted some simple experiments that show how this gender bias might manifest itself. When they wrote a program designed to read Web pages and rank their relevance, they found the system would rank information about female programmers as less relevant than that about their male counterparts.

The researchers also developed a way to remove gender bias from embeddings by adjusting the mathematical relationship between gender-neutral words like “programmer” and gendered words such as “man” and “woman.”

But not everyone believes gender bias should be eliminated from the data sets. Arvind Narayanan, an assistant professor of computer science at Princeton, has also analyzed word embedding and found gender, racial, and other prejudices. But Narayanan cautions against removing bias automatically, arguing that it could skew a computer’s representation of the real world and make it less adept at making predictions or analyzing data.

“We should think of these not as a bug but a feature,” Narayanan says. “It really depends on the application. What constitutes a terrible bias or prejudice in one application might actually end up being exactly the meaning you want to get out of the data in another application.”

Several word embedding data sets exist, including Word2Vec, created by researchers at Google, and GloVe, developed at Stanford University. Google declined to comment on research showing gender bias in Word2Vec, but the company is clearly conscious of the challenge. A recent blog post describes a technical approach to removing bias from decision-making AI systems without affecting their usefulness.

Biased AI systems could exacerbate the unfairness that already exists, says Barbara Grosz, a professor at Harvard University. “When you are in a society that is evolving in certain ways, then you are actually trying to change the future to be not like the past,” says Grosz, who cowrote a report called AI 100, a project from Stanford University aimed at understanding the potential dangers of AI (see “AI Wants to Be Your Bro, Not Your Foe”). “And to the extent that we rely on algorithms that do that kind of predicting,” Grosz says, “there’s an ethical question about whether we’re inhibiting the very evolution that we want.” 

Grosz concedes that there may be situations when it doesn’t make sense to remove bias from a data set. “It’s not that you can avoid all these kinds of bias, but we need to be mindful in our design, and we need to be mindful about what we claim about our programs and their results,” Grosz adds. “For many of these ethical questions, there isn’t a single right answer.”

Deep Dive

Artificial intelligence

conceptual illustration showing various women's faces being scanned
conceptual illustration showing various women's faces being scanned

A horrifying new AI app swaps women into porn videos with a click

Deepfake researchers have long feared the day this would arrive.

Conceptual illustration of a therapy session
Conceptual illustration of a therapy session

The therapists using AI to make therapy better

Researchers are learning more about how therapy works by examining the language therapists use with clients. It could lead to more people getting better, and staying better.

a Chichuahua standing on a Great Dane
a Chichuahua standing on a Great Dane

DeepMind says its new language model can beat others 25 times its size

RETRO uses an external memory to look up passages of text on the fly, avoiding some of the costs of training a vast neural network

fake person rewind to real person
fake person rewind to real person

AI fake-face generators can be rewound to reveal the real faces they trained on

Researchers are calling into doubt the popular idea that deep-learning models are “black boxes” that reveal nothing about what goes on inside

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.