• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
    • Tech Futures
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

How Prevalent is Gender Bias in ChatGPT? – Exploring German and English ChatGPT Responses

February 1, 2024

🔬 Research Summary by Stefanie Urchs, a Computer Science Ph.D. student at the Hochschule München University of Applied Sciences, deeply interested in interdisciplinary approaches to natural language processing.

[Original paper by Stefanie Urchs, Veronika Thurner, Matthias AĂźenmacher, Christian Heumann, and Stephanie Thiemichen]


Overview: ChatGPT opened the world of large language models to non-IT users who tend to use the system as an all-knowing chatbot without regard to the pitfalls of the technology. In this paper, the authors examine how gender-biased ChatGPT responses are and what other pitfalls await an unprepared non-IT user.


Introduction

By introducing ChatGPT with its intuitive user interface (UI), OpenAI opened the world of state-of-the-art natural language processing to non-IT users. Users do not need a computer science background to interact with the system. Instead, they have a natural language conversation in the UI. Many users utilize the system to help with their daily work: Writing texts, checking grammar and spelling, and even fact-checking their work. However, non-IT users tend to see the system as a “magical box” that knows all the answers and believe that because machines do not make mistakes, neither does ChatGPT. This lack of critical usage is problematic in everyday use.

We prompt ChatGPT in German and English from a neutral, female, and male perspective to examine the differences in responses. We inspect three prompts in depth after broadly prompting the system to define the problem space. ChatGPT is a good tool to use for drafting texts. However, it still has problems with gender-neutral language and tends to overcorrect if a prompt contains gender. In the end, we still need humans to check the work of machines.

Key Insights

Gender Bias in Large Language Models

What do we mean when discussing bias?

It is important to define the term bias properly to detect biases in text. In machine learning, specifically in a classification task, bias is defined as the preference of a model towards a certain class. However, in natural language, bias has a different definition: “Biases are preconceived notions based on beliefs, attitudes, and/or stereotypes about people pertaining to certain social categories that can be implicit or explicit.” (Mateo et al., 2020). When writing text, humans tend to incorporate these notions—our attitude towards the different biases changes over time. For example, our understanding of the role of women in society or the LGBTQIA+ community today differs from thirty years ago.

Where’s the problem?

One of the premises of machine learning is that the more data there is, the better. Therefore, large language models (LLMs) are trained on as much textual data as possible. OpenAI decided not to mention what data they used to train the model that is the basis for ChatGPT. However, the training data likely includes much text from the web. This leads to two problems: first, the web is primarily male, white, and US American. Second, as mentioned above, our values evolved, but we train our modern-day language models on texts from several decades ago. This is problematic because these models learn to choose the statistically most likely next word, depending on the words they have already seen (prompt) or generated (response up till the word it generates at that moment). This means if the model is trained on a lot of text with women and housework in proximity, the model will learn that these concepts belong together. Therefore, the model reproduces all biases that are contained in the training data.

Detecting Gender Bias in ChatGPT

Our Approach

We analyze ChatGPT responses from the point of view of a non-IT user working in university communications by prompting the system in German and English from a neutral, female, and male perspective. At first, we probe the system with open-ended and neutrally formulated prompts for possibly problematic responses. Even one occurrence of controversial behavior can be problematic for a user who does not check the response thoroughly before publishing. Additionally, the system is used very frequently by many users, thus generating a tremendous amount of responses. Hence, problematic behavior will be generated again. Subsequently, we chose two prompts to investigate further. In contrast to the first probing, we now repeat each prompt ten times to examine whether problems arise from scaling. We analyze these responses for words used in the text, usage of female/male-coded words, and text length.

Findings

While first probing the system, we found that ChatGPT excels in English but uses US American English by default. German responses sometimes lack grammatical correctness. However, these grammatical mistakes are not obvious while skimming the response. Furthermore, ChatGPT has a problem using correct grammar regarding German gender-neutral language. However, ChatGPT can use the gender-neutral “they” in English. When explicitly adding gender to a prompt, ChatGPT tends to trigger topics about fairness and equality, which are not mentioned in responses to neutral prompts. Unfortunately, ChatGPT has no concept of male and female and uses the same reasoning for both. Therefore, women should become professors to elevate other women. Men should become professors so young men know that it is possible to excel in STEM (science, technology, engineering, and mathematics) fields.

While prompting the system more in-depth, we found that ChatGPT hallucinates information into generic prompts. Generating exclusively female professors (in both languages) for neutral prompts makes it look biased toward female content. Furthermore, the system displays a bias toward STEM-related research fields, while the responses overall use relatively few gender-coded words and do not reinforce common language biases. German and English responses were similar in stressed content, which is suitable for using the system for bi-lingual text generation.

ChatGPT is useful for helping non-IT users draft texts for their daily work. However, it is crucial to thoroughly check the system’s responses for biases and syntactic and grammatical mistakes.

Between the lines

Due to our endeavor to analyze ChatGPT from a non-IT user’s perspective, working in university communications, we had a limited scope of possible prompts that led to subtle differences between the perspectives. To explore the differences between gendered responses, more general prompts should be explored. Furthermore, changing the context from formal university communications to a more informal one might lead to more biased results.

Our research, which is based in a formal setting, results in an unexpectedly positive outcome regarding gender bias in ChatGPT. However, changing the context to a less formal setting might result in more biased responses.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

An abstract spiral of dark circles appears at the centre, resembling a tornado. Several vintage magazine covers and advertisements are being drawn toward the spiral. The artworks that have already been pulled into it are becoming distorted and replaced with clusters of numbers representing their numerical embeddings.

Tech Futures: Better Imagination for Better Tech Futures

This image is a collage with a colourful Japanese vintage landscape showing a mountain, hills, flowers and other plants and a small stream. There are 3 large black data servers placed in the bottom half of the image, with a cloud of black smoke emitting from them, partly obscuring the scenery.

Tech Futures: Crafting Participatory Tech Futures

A network diagram with lots of little emojis, organised in clusters.

Tech Futures: AI For and Against Knowledge

A brightly coloured illustration which can be viewed in any direction. It has many elements to it working together: men in suits around a table, someone in a data centre, big hands controlling the scenes and holding a phone, people in a production line. Motifs such as network diagrams and melting emojis are placed throughout the busy vignettes.

Tech Futures: The Fossil Fuels Playbook for Big Tech: Part II

A rock embedded with intricate circuit board patterns, held delicately by pale hands drawn in a ghostly style. The contrast between the rough, metallic mineral and the sleek, artificial circuit board illustrates the relationship between raw natural resources and modern technological development. The hands evoke human involvement in the extraction and manufacturing processes.

Tech Futures: The Fossil Fuels Playbook for Big Tech: Part I

related posts

  • CRUSH: Contextually Regularized and User Anchored Self-Supervised Hate Speech Detection

    CRUSH: Contextually Regularized and User Anchored Self-Supervised Hate Speech Detection

  • AI-synthesized faces are indistinguishable from real faces and more trustworthy

    AI-synthesized faces are indistinguishable from real faces and more trustworthy

  • DICES Dataset: Diversity in Conversational AI Evaluation for Safety

    DICES Dataset: Diversity in Conversational AI Evaluation for Safety

  • A Look at the American Data Privacy and Protection Act

    A Look at the American Data Privacy and Protection Act

  • AI Ethics: Inclusivity in Smart Cities

    AI Ethics: Inclusivity in Smart Cities

  • Open and Linked Data Model for Carbon Footprint Scenarios

    Open and Linked Data Model for Carbon Footprint Scenarios

  • Research summary: AI Governance: A Holistic Approach to Implement Ethics in AI

    Research summary: AI Governance: A Holistic Approach to Implement Ethics in AI

  • Harnessing Collective Intelligence Under a Lack of Cultural Consensus

    Harnessing Collective Intelligence Under a Lack of Cultural Consensus

  • From OECD to India: Exploring cross-cultural differences in perceived trust, responsibility and reli...

    From OECD to India: Exploring cross-cultural differences in perceived trust, responsibility and reli...

  • Large Language Models Can Be Used To Effectively Scale Spear Phishing Campaigns

    Large Language Models Can Be Used To Effectively Scale Spear Phishing Campaigns

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.