Machine learning algorithms have found that men are represented four times more than women in literature, with transgender and non-binary people largely missing
Authors have been translating the human experience into writing for hundreds of years. From Rabindranath Tagore, a Bengali polymath credited with countless works of transformative writing, to the contemporary work of Malorie Blackman, whose work on Black identity shaped empathy in classrooms across the UK, literature expresses what it means to be alive.
But which stories are seen as beautiful, crucial and interesting enough to be published? With historical prejudice built into systems of publishing and consumption, literature has long-suffered from an implicit bias problem – especially with women in literature.
Researchers look at 3,000 books to understand bias
Looking at this disparity when it comes to gender, researchers at the Viterbi School of Engineering at the University of Southern California ran 3,000 English-language books through a machine-learning algorithm.
The genre of books ranged from adventure and science fiction, to mystery and romance, and in varied mediums, including novels, short stories, and poetry.
Mayank Kejriwal, a research lead at USC’s Information Sciences Institute (ISI), was inspired by existing work on implicit gender biases and his own expertise in natural language processing (NLP).
Akarsh Nagaraj, co-author of the study, helped to figure out the 4:1 male-female literary imbalance.
Women in literature are four times less prevalent than men
“Gender bias is very real, and when we see females four times less in literature, it has a subliminal impact on people consuming the culture,” said Kejriwal, a research assistant professor in the Daniel J Epstein Department of Industrial and Systems Engineering.
“We quantitatively revealed in an indirect way in which bias persists in culture.”
The AI also looked for adjective associations with gender-specific characters, deepening their understanding of bias in society. The tech allowed them to side-step the bias that happens when data is collected via surveys.
Looking at this connection, the team found that words associated with women were adjectives like ‘weak’, ‘amiable’ ‘pretty’ and sometimes ‘stupid’. For male characters, the AI found that words describing them included ‘leadership’, ‘power’, ‘strength’ and ‘politics’.
The researchers said that when the author was female, the discrepancy between male and female characters decreases. The authors say that women represent themselves “much more” than a male writer would.
Data lacked information on non-binary and transgender characters
Nagaraj, also a Machine Learning Engineer at Meta, said: “Books are a window to the past, and the writing of these authors gives us a glimpse into how people perceive the world, and how it has changed.”
However, the data had limitations when it came to non-binary and transgender prevalence in literature.
“When we published the dataset paper, reviewers had this criticism that we were ignoring non-dichotomous genders,” said Kejriwal.
“But we agreed with them, in a way. We think it’s completely suppressed, and we won’t be able to find many [transgender individuals or non-dichotomous individuals].”