Skip to Content

Computational linguistics reveals pervasive gender bias in modern English novels

Researchers data-mined novels shortlisted for the Man Booker Prize and found that men were described as brilliant and brutal, while the women were young and lovely.

Gender bias is an insidious problem throughout society. It arises most obviously through deliberate discrimination but also exists through widespread unconscious bias. This permeates our culture, our workplaces, and even our language, often in ways we are unaware of.

The first step in changing this is uncovering bias where it exists. And that’s where the emerging science of computational linguistics is turning out to be useful.

This relatively new discipline uses data mining and machine learning to study text. And it has begun to reveal biases in everything from Wikipedia articles to the language itself.

Adjectives associated with male and female terms in novels shortlisted for the Man Booker prizes.

Today, Nishtha Madaan at IBM Research India and colleagues go further. They say they have used the same technique to reveal a significant gender bias in books nominated for the Man Booker Prize, one of the world’s top literary prizes, awarded each year to the best original novel written in English.

Their approach is relatively straightforward. Madaan and his colleagues consider all books shortlisted for the prize between 1969 and 2017, some 275 novels in total. Instead of analyzing the text from the novel, the team studied the description of the books posted to Goodreads, a social catalogue owned by Amazon that offers free access to descriptions, reviews, and ratings of more than 400 million books.

They then asked how men and women were portrayed in these descriptions. The answers make for uncomfortable reading.  “This reveals the pervasiveness of gender bias and stereotype in the books on different features like occupation, introductions, and actions associated to the characters in the book,” say Madaan and co.

For a start, women are mentioned far less than men in these books—on average around 15 times versus 30 for men.

They are also described very differently. To show how, Madaan and co extracted adjectives associated with male and female terms in the text. They then created word clouds to show which terms appear more often for each sex.

These word clouds are shown here in the accompanying graphic—no prizes for guessing which is which.

The team also study stereotypes by extracting the occupation of characters and then creating male and female word clouds. The top occupations for men are: doctor, physician, surgeon, psychologist, professor, scientist, business, director, and so on.

By contrast the top occupations for women are: teacher, lecturer, nurse, whore, hooker, child wife, child bride, and so on.

“We observed that while analyzing occupations for males and females, higher level roles are designated to males while lower level roles are designated to females,” say Madaan and co.

There are some positive signs of change, however. The team says that in recent years, shortlisted books have begun to appear in which women play a central role in the ext. These include Do Not Say We Have Nothing by Madeleine Thien, How to be Both by Ali Smith, We Are All Completely Beside Ourselves by Karen Joy Fowler, and others.

That’s interesting work but it suffers from some shortcomings.  Most significant is that the team does not clearly describe the data it has gathered, the size of this database, when it was written, or by who. That makes the work hard to assess.

For example, it may be that the descriptions of the books are not written by the authors themselves but by a correspondent on Goodreads. So any bias may come from this correspondent rather than reflect the book. The authors do not appear to have explored this possibility.

And of course, the authors of the books might argue that their novels explore bias and its impact on society. For this reason, the novels must reflect this bias in the text. The authors might say it was never their intention to produce a gender-neutral novel, for example.

Nevertheless, this paper shows the potential to explore bias in culturally significant work. Indeed, the authors have already used this technique to explore bias in Bollywood movie scripts and have found significant gender stereotyping, particularly with respect to occupations.

The team is also developing a mechanism for removing bias. Just how useful this would be for novels shortlisted for the Man Booker Prize isn’t clear. But it certainly serves to highlight a problem that undoubtedly needs more attention.

Ref: Judging a Book by its Description: Analyzing Gender Stereotypes in the Man Bookers Prize Winning Fiction

Deep Dive


AI-powered 6G networks will reshape digital interactions

The convergence of AI and communication technologies will create 6G networks that make hyperconnectivity and immersive experiences an everyday reality for consumers.

The power of green computing

Sustainable computing practices have the power to both infuse operational efficiencies and greatly reduce energy consumption, says Jen Huffstetler, chief product sustainability officer at Intel.

Using data, AI, and cloud to transform real estate

AI can enable business transformation to deliver positive outcomes for clients and propel sustainability goals, according to Sandeep Davé, chief digital and technology officer at CBRE.

How this Turing Award–winning researcher became a legendary academic advisor

Theoretical computer scientist Manuel Blum has guided generations of graduate students into fruitful careers in the field.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.