• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Public Perceptions of Gender Bias in Large Language Models: Cases of ChatGPT and Ernie

December 14, 2023

🔬 Research Summary by Kyrie Zhixuan Zhou and Madelyn Rose Sanfilippo.

Kyrie Zhixuan Zhou is a PhD student at the University of Illinois at Urbana-Champaign, aiming to understand, design, and govern ICT/AI experience for vulnerable populations.

Madelyn Rose Sanfilippo is an Assistant Professor at the University of Illinois at Urbana-Champaign, studying the governance of sociotechnical systems and outcomes, inequality, and consequences within these systems.

[Original paper by Kyrie Zhixuan Zhou and Madelyn Rose Sanfilippo]


Overview: Two of the world’s most prominent large language models (LLMs), ChatGPT (US) and Ernie (China), reflect gender bias. In this paper, we qualitatively analyze and compare social media discussions and identify different narrative patterns, e.g., users more often complain about “implicit” bias in ChatGPT. In contrast, Ernie is said to contain “explicit” concerning biases. We propose governance recommendations to regulate gender bias in LLMs.


Introduction

LLMs are quickly gaining momentum yet demonstrate gender bias in their responses. We conducted a content analysis of social media discussions to gauge public perceptions of gender bias in LLMs, which are trained in different cultural contexts, i.e., ChatGPT, a US-based LLM, or Ernie, a China-based LLM. People shared both observations of gender bias in their personal use and scientific findings about gender bias in LLMs. A difference was identified in perceptions of the two LLMs – ChatGPT was more often found to carry implicit gender bias, e.g., associating men and women with different profession titles, while explicit gender bias was found in Ernie’s responses, e.g., overly promoting women’s pursuit of marriage over career. Based on the findings, we reflect on the impact of culture on gender bias and propose governance recommendations to address gender bias in LLMs.

Key Insights

Motivation and Methodology

Gender bias has long been studied in computer systems and other domains. Gender bias in LLMs such as ChatGPT could harm half the global population if it is widely adopted. Understanding public perceptions toward gender bias in LLMs is crucial to ensuring policies and regulations are relevant and effective in meeting people’s needs. In the meantime, LLMs are trained on data collected from search engines, online forums, websites, and so on. Thus, LLMs can reflect and even amplify existing biases in human language. Social biases exist in varying forms in different cultures. LLMs trained on data collected from different cultures may also demonstrate different types and levels of biases. To understand how gender bias manifests in LLMs rooted in different cultures, we examined ChatGPT by OpenAI, based in the US, and Ernie by Baidu, based in China, for a comparative case study toward informing contextual and concrete governance mechanisms, including new regulations and adaptations of existing laws.

We approached public perceptions about gender bias in ChatGPT and Ernie by comparing social media discussions around these two LLMs. Social media discussions about ChatGPT were obtained via the search query “gender bias in chatgpt” from Twitter. Similarly, we searched with the Chinese query “gender bias in Ernie” on Sina Weibo. Our data collection covered nine months of discussions about ChatGPT and six months about Ernie. We collected and analyzed the data simultaneously and stopped searching for more discussions after reaching a theoretical saturation in the analysis, i.e., no new themes emerged. However, we constantly looked for recent discussions in case new perceptions or concerns arose. A thematic analysis approach was adopted to analyze the online discussions.

Findings and Impact

The main findings from the discussion about ChatGPT included (1) observation of implicit gender bias (e.g., associating gender with different profession titles), (2) political correctness (e.g., refusing to tell jokes about women), (3) dissemination of scientific findings regarding LLM bias, and (4) call for action (e.g., an educator promoting fairness in her lessons). 

The main findings from the discussion about Ernie included (1) observation of explicit gender bias (e.g., “women had better get married at a young age”), (2) criticism of culture underlying the training data, and (3) a lack of discussion on gender bias in Ernie, possibly due to the censorship on the social media platform where the discussion took place.

Our comparative analysis of gender bias in ChatGPT and Ernie sheds light on the intertwining relationship between culture and LLM/AI gender bias. Further, we propose governance recommendations to regulate gender bias in LLMs (more details in the paper), including (1) creating concrete and contextual policies, (2) applying various existing legal precedents to protect at-risk populations, and (3) forming new norms to mitigate emergent bias in or discrimination by AI.

Between the lines

An important takeaway from our research is that cultural factors impact biases in LLMs since they were trained on data that reflect those cultural contexts. Regulation should thus be contextual and concrete instead of “one-size-fits-all.” Further, we want to inspire norm formation in terms of building non-discriminatory AI – Industrial practices that have built upon social norms to protect user privacy can partly be attributed to legislation around privacy, such as GDPR and enforcement actions against the actors who refused to comply, such as Facebook (Meta). Norms to mitigate discrimination in AI systems might be similarly formed with legislation efforts and by engaging and educating users. The paper discusses several concrete ideas for future research, especially regarding regulation and policy-making. Ultimately, we hope our findings will inspire more interesting, interdisciplinary research in AI ethics and computational social science.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

ALL IN Conference 2025: Four Key Takeaways from Montreal

Beyond Dependency: The Hidden Risk of Social Comparison in Chatbot Companionship

AI Policy Corner: Restriction vs. Regulation: Comparing State Approaches to AI Mental Health Legislation

Beyond Consultation: Building Inclusive AI Governance for Canada’s Democratic Future

AI Policy Corner: U.S. Executive Order on Advancing AI Education for American Youth

related posts

  • Automating Informality: On AI and Labour in the Global South (Research Summary)

    Automating Informality: On AI and Labour in the Global South (Research Summary)

  • Responsible Use of Technology in Credit Reporting: White Paper

    Responsible Use of Technology in Credit Reporting: White Paper

  • Regulatory Instruments for Fair Personalized Pricing

    Regulatory Instruments for Fair Personalized Pricing

  • Responsible Use of Technology: The IBM Case Study

    Responsible Use of Technology: The IBM Case Study

  • Hazard Contribution Modes of Machine Learning Components (Research Summary)

    Hazard Contribution Modes of Machine Learning Components (Research Summary)

  • Research summary:  Laughing is Scary, but Farting is Cute: A Conceptual Model of Children’s Perspect...

    Research summary: Laughing is Scary, but Farting is Cute: A Conceptual Model of Children’s Perspect...

  • AI Art and Misinformation: Approaches and Strategies for Media Literacy and Fact-Checking

    AI Art and Misinformation: Approaches and Strategies for Media Literacy and Fact-Checking

  • Rethink reporting of evaluation results in AI

    Rethink reporting of evaluation results in AI

  • The Ethics of AI Value Chains: An Approach for Integrating and Expanding AI Ethics Research, Practic...

    The Ethics of AI Value Chains: An Approach for Integrating and Expanding AI Ethics Research, Practic...

  • In AI We Trust: Ethics, Artificial Intelligence, and Reliability

    In AI We Trust: Ethics, Artificial Intelligence, and Reliability

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.