• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

How Prevalent is Gender Bias in ChatGPT? – Exploring German and English ChatGPT Responses

February 1, 2024

🔬 Research Summary by Stefanie Urchs, a Computer Science Ph.D. student at the Hochschule München University of Applied Sciences, deeply interested in interdisciplinary approaches to natural language processing.

[Original paper by Stefanie Urchs, Veronika Thurner, Matthias Aßenmacher, Christian Heumann, and Stephanie Thiemichen]


Overview: ChatGPT opened the world of large language models to non-IT users who tend to use the system as an all-knowing chatbot without regard to the pitfalls of the technology. In this paper, the authors examine how gender-biased ChatGPT responses are and what other pitfalls await an unprepared non-IT user.


Introduction

By introducing ChatGPT with its intuitive user interface (UI), OpenAI opened the world of state-of-the-art natural language processing to non-IT users. Users do not need a computer science background to interact with the system. Instead, they have a natural language conversation in the UI. Many users utilize the system to help with their daily work: Writing texts, checking grammar and spelling, and even fact-checking their work. However, non-IT users tend to see the system as a “magical box” that knows all the answers and believe that because machines do not make mistakes, neither does ChatGPT. This lack of critical usage is problematic in everyday use.

We prompt ChatGPT in German and English from a neutral, female, and male perspective to examine the differences in responses. We inspect three prompts in depth after broadly prompting the system to define the problem space. ChatGPT is a good tool to use for drafting texts. However, it still has problems with gender-neutral language and tends to overcorrect if a prompt contains gender. In the end, we still need humans to check the work of machines.

Key Insights

Gender Bias in Large Language Models

What do we mean when discussing bias?

It is important to define the term bias properly to detect biases in text. In machine learning, specifically in a classification task, bias is defined as the preference of a model towards a certain class. However, in natural language, bias has a different definition: “Biases are preconceived notions based on beliefs, attitudes, and/or stereotypes about people pertaining to certain social categories that can be implicit or explicit.” (Mateo et al., 2020). When writing text, humans tend to incorporate these notions—our attitude towards the different biases changes over time. For example, our understanding of the role of women in society or the LGBTQIA+ community today differs from thirty years ago.

Where’s the problem?

One of the premises of machine learning is that the more data there is, the better. Therefore, large language models (LLMs) are trained on as much textual data as possible. OpenAI decided not to mention what data they used to train the model that is the basis for ChatGPT. However, the training data likely includes much text from the web. This leads to two problems: first, the web is primarily male, white, and US American. Second, as mentioned above, our values evolved, but we train our modern-day language models on texts from several decades ago. This is problematic because these models learn to choose the statistically most likely next word, depending on the words they have already seen (prompt) or generated (response up till the word it generates at that moment). This means if the model is trained on a lot of text with women and housework in proximity, the model will learn that these concepts belong together. Therefore, the model reproduces all biases that are contained in the training data.

Detecting Gender Bias in ChatGPT

Our Approach

We analyze ChatGPT responses from the point of view of a non-IT user working in university communications by prompting the system in German and English from a neutral, female, and male perspective. At first, we probe the system with open-ended and neutrally formulated prompts for possibly problematic responses. Even one occurrence of controversial behavior can be problematic for a user who does not check the response thoroughly before publishing. Additionally, the system is used very frequently by many users, thus generating a tremendous amount of responses. Hence, problematic behavior will be generated again. Subsequently, we chose two prompts to investigate further. In contrast to the first probing, we now repeat each prompt ten times to examine whether problems arise from scaling. We analyze these responses for words used in the text, usage of female/male-coded words, and text length.

Findings

While first probing the system, we found that ChatGPT excels in English but uses US American English by default. German responses sometimes lack grammatical correctness. However, these grammatical mistakes are not obvious while skimming the response. Furthermore, ChatGPT has a problem using correct grammar regarding German gender-neutral language. However, ChatGPT can use the gender-neutral “they” in English. When explicitly adding gender to a prompt, ChatGPT tends to trigger topics about fairness and equality, which are not mentioned in responses to neutral prompts. Unfortunately, ChatGPT has no concept of male and female and uses the same reasoning for both. Therefore, women should become professors to elevate other women. Men should become professors so young men know that it is possible to excel in STEM (science, technology, engineering, and mathematics) fields.

While prompting the system more in-depth, we found that ChatGPT hallucinates information into generic prompts. Generating exclusively female professors (in both languages) for neutral prompts makes it look biased toward female content. Furthermore, the system displays a bias toward STEM-related research fields, while the responses overall use relatively few gender-coded words and do not reinforce common language biases. German and English responses were similar in stressed content, which is suitable for using the system for bi-lingual text generation.

ChatGPT is useful for helping non-IT users draft texts for their daily work. However, it is crucial to thoroughly check the system’s responses for biases and syntactic and grammatical mistakes.

Between the lines

Due to our endeavor to analyze ChatGPT from a non-IT user’s perspective, working in university communications, we had a limited scope of possible prompts that led to subtle differences between the perspectives. To explore the differences between gendered responses, more general prompts should be explored. Furthermore, changing the context from formal university communications to a more informal one might lead to more biased results.

Our research, which is based in a formal setting, results in an unexpectedly positive outcome regarding gender bias in ChatGPT. However, changing the context to a less formal setting might result in more biased responses.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

AI Policy Corner: Frontier AI Safety Commitments, AI Seoul Summit 2024

AI Policy Corner: The Colorado State Deepfakes Act

Special Edition: Honouring the Legacy of Abhishek Gupta (1992–2024)

AI Policy Corner: The Turkish Artificial Intelligence Law Proposal

From Funding Crisis to AI Misuse: Critical Digital Rights Challenges from RightsCon 2025

related posts

  • Governing AI to Advance Shared Prosperity

    Governing AI to Advance Shared Prosperity

  • Animism, Rinri, Modernization; the Base of Japanese Robotics

    Animism, Rinri, Modernization; the Base of Japanese Robotics

  • Balancing Transparency and Risk: The Security and Privacy Risks of Open-Source Machine Learning Mode...

    Balancing Transparency and Risk: The Security and Privacy Risks of Open-Source Machine Learning Mode...

  • Responsible Use of Technology: The IBM Case Study

    Responsible Use of Technology: The IBM Case Study

  • “Cold Hard Data” – Nothing Cold or Hard About It

    “Cold Hard Data” – Nothing Cold or Hard About It

  • Predatory Medicine: Exploring and Measuring the Vulnerability of Medical AI to Predatory Science

    Predatory Medicine: Exploring and Measuring the Vulnerability of Medical AI to Predatory Science

  • Atomist or holist? A diagnosis and vision for more productive interdisciplinary AI ethics dialogue

    Atomist or holist? A diagnosis and vision for more productive interdisciplinary AI ethics dialogue

  • Faith and Fate: Limits of Transformers on Compositionality

    Faith and Fate: Limits of Transformers on Compositionality

  • How Helpful do Novice Programmers Find the Feedback of an Automated Repair Tool?

    How Helpful do Novice Programmers Find the Feedback of an Automated Repair Tool?

  • The E.U.’s Artificial Intelligence Act: An Ordoliberal Assessment

    The E.U.’s Artificial Intelligence Act: An Ordoliberal Assessment

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.