Computational linguistics reveals pervasive gender bias in modern English novels

Researchers data-mined novels shortlisted for the Man Booker Prize and found that men were described as brilliant and brutal, while the women were young and lovely.

Emerging Technology from the arXivarchive page

August 7, 2018

Gender bias is an insidious problem throughout society. It arises most obviously through deliberate discrimination but also exists through widespread unconscious bias. This permeates our culture, our workplaces, and even our language, often in ways we are unaware of.

The first step in changing this is uncovering bias where it exists. And that’s where the emerging science of computational linguistics is turning out to be useful.

This relatively new discipline uses data mining and machine learning to study text. And it has begun to reveal biases in everything from Wikipedia articles to the language itself.

Adjectives associated with male and female terms in novels shortlisted for the Man Booker prizes.

Today, Nishtha Madaan at IBM Research India and colleagues go further. They say they have used the same technique to reveal a significant gender bias in books nominated for the Man Booker Prize, one of the world’s top literary prizes, awarded each year to the best original novel written in English.

Their approach is relatively straightforward. Madaan and his colleagues consider all books shortlisted for the prize between 1969 and 2017, some 275 novels in total. Instead of analyzing the text from the novel, the team studied the description of the books posted to Goodreads, a social catalogue owned by Amazon that offers free access to descriptions, reviews, and ratings of more than 400 million books.

They then asked how men and women were portrayed in these descriptions. The answers make for uncomfortable reading. “This reveals the pervasiveness of gender bias and stereotype in the books on different features like occupation, introductions, and actions associated to the characters in the book,” say Madaan and co.

For a start, women are mentioned far less than men in these books—on average around 15 times versus 30 for men.

They are also described very differently. To show how, Madaan and co extracted adjectives associated with male and female terms in the text. They then created word clouds to show which terms appear more often for each sex.

These word clouds are shown here in the accompanying graphic—no prizes for guessing which is which.

The team also study stereotypes by extracting the occupation of characters and then creating male and female word clouds. The top occupations for men are: doctor, physician, surgeon, psychologist, professor, scientist, business, director, and so on.

By contrast the top occupations for women are: teacher, lecturer, nurse, whore, hooker, child wife, child bride, and so on.

“We observed that while analyzing occupations for males and females, higher level roles are designated to males while lower level roles are designated to females,” say Madaan and co.

There are some positive signs of change, however. The team says that in recent years, shortlisted books have begun to appear in which women play a central role in the ext. These include Do Not Say We Have Nothing by Madeleine Thien, How to be Both by Ali Smith, We Are All Completely Beside Ourselves by Karen Joy Fowler, and others.

That’s interesting work but it suffers from some shortcomings. Most significant is that the team does not clearly describe the data it has gathered, the size of this database, when it was written, or by who. That makes the work hard to assess.

For example, it may be that the descriptions of the books are not written by the authors themselves but by a correspondent on Goodreads. So any bias may come from this correspondent rather than reflect the book. The authors do not appear to have explored this possibility.

And of course, the authors of the books might argue that their novels explore bias and its impact on society. For this reason, the novels must reflect this bias in the text. The authors might say it was never their intention to produce a gender-neutral novel, for example.

Nevertheless, this paper shows the potential to explore bias in culturally significant work. Indeed, the authors have already used this technique to explore bias in Bollywood movie scripts and have found significant gender stereotyping, particularly with respect to occupations.

The team is also developing a mechanism for removing bias. Just how useful this would be for novels shortlisted for the Man Booker Prize isn’t clear. But it certainly serves to highlight a problem that undoubtedly needs more attention.

Ref: arxiv.org/abs/1807.10615: Judging a Book by its Description: Analyzing Gender Stereotypes in the Man Bookers Prize Winning Fiction

Deep Dive

Computing

It’s time to retire the term “user”

The proliferation of AI means we need a new word.

Taylor Majewskiarchive page

How ASML took over the chipmaking chessboard

MIT Technology Review sat down with outgoing CTO Martin van den Brink to talk about the company’s rise to dominance and the life and death of Moore’s Law.

How Wi-Fi sensing became usable tech

After a decade of obscurity, the technology is being used to track people’s movements.

Meg Duffarchive page

Why it’s so hard for China’s chip industry to become self-sufficient

Chip companies from the US and China are developing new materials to reduce reliance on a Japanese monopoly. It won’t be easy.

Zeyi Yangarchive page

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Computational linguistics reveals pervasive gender bias in modern English novels

Deep Dive

Computing

It’s time to retire the term “user”

How ASML took over the chipmaking chessboard

How Wi-Fi sensing became usable tech

Why it’s so hard for China’s chip industry to become self-sufficient

Stay connected

Get the latest updates from
MIT Technology Review

The latest iteration of a legacy

Advertise with MIT Technology Review

About

Help

Deep Dive

Computing

It’s time to retire the term “user”

How ASML took over the chipmaking chessboard

How Wi-Fi sensing became usable tech

Why it’s so hard for China’s chip industry to become self-sufficient

Stay connected

Get the latest updates fromMIT Technology Review

Get the latest updates from
MIT Technology Review