• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Bias and Fairness in Large Language Models: A Survey

September 27, 2023

馃敩 Research Summary by Isabel O. Gallegos, a Ph.D. student in Computer Science at Stanford University, researching algorithmic fairness to interrogate the role of artificial intelligence in equitable decision-making.

[Original paper by Isabel O. Gallegos, Ryan A. Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, and Nesreen K. Ahmed]


Overview: Social biases in large language models (LLMs) have been well-documented, but how can we address them? This paper presents a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We consolidate, formalize, and expand notions of social bias and fairness in natural language processing, unify the literature with three intuitive taxonomies, and identify open problems and challenges for future work.聽


Introduction

Rapid advancements in large language models (LLMs) have enabled the understanding and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. This paper presents a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing. We then unify the literature by proposing three intuitive taxonomies: two for bias evaluation, namely metrics and datasets, and one for mitigation. Our first taxonomy of metrics for bias evaluation organizes metrics by the different levels at which they operate in a model: embeddings, probabilities, and generated text. Our second taxonomy of datasets for bias evaluation categorizes datasets by their structure; we also release a consolidation of publicly available datasets for improved access. Our third taxonomy of techniques for bias mitigation classifies methods by their intervention during pre-processing, in-training, intra-processing, and post-processing. Finally, we identify open problems and challenges for future work.  Synthesizing a wide range of recent research, we aim to provide a clear guide of the existing literature that empowers researchers and practitioners to better understand and prevent bias propagation in LLMs.

Key Insights

The Challenge of Bias in Large Language Models

The rise and rapid advancement of large language models (LLMs) has fundamentally changed language technologies. With the ability to generate human-like text and adapt to a wide array of natural language processing (NLP) tasks, the impressive capabilities of these models have initiated a paradigm shift in the development of language models. Instead of training task-specific models on relatively small task-specific datasets, researchers and practitioners can use LLMs as foundation models that can be fine-tuned for particular functions. Even without fine-tuning, foundation models increasingly enable few- or zero-shot capabilities for various scenarios like classification, question-answering, logical reasoning, fact retrieval, information extraction, and more.

Laying behind these successes, however, is the potential to perpetuate harm. Typically trained on an enormous scale of uncurated Internet-based data, LLMs inherit stereotypes, misrepresentations, derogatory and exclusionary language, and other denigrating behaviors that disproportionately affect already vulnerable and marginalized communities. These harms are “social bias,” a subjective and normative term we broadly use to refer to disparate treatment or outcomes between social groups arising from historical and structural power asymmetries. Though LLMs often reflect existing biases, they can amplify these biases, too; in either case, the automated reproduction of injustice can reinforce systems of inequity.

Defining Bias and Fairness for NLP

Despite the growing emphasis on addressing these issues, bias and fairness research in LLMs often fails to precisely describe the harms of model behaviors: who is harmed, why the behavior is harmful, and how the harm reflects and reinforces social hierarchies. Consolidating literature from machine learning, NLP, and (socio)linguistics, we define several distinct facets of bias to disambiguate the types of social harms that may emerge from LLMs. We organize these harms in a taxonomy of social biases that researchers and practitioners can leverage to accurately describe bias evaluation and mitigation efforts. We shift fairness frameworks typically applied to machine learning classification problems towards NLP and introduce several fairness desiderata that begin to operationalize various fairness notions for LLMs.

Taxonomies for Bias Evaluation and Mitigation

With the growing recognition of the biases embedded in LLMs has emerged an abundance of works proposing techniques to measure or remove social bias, primarily organized by (1) metrics for bias evaluation, (2) datasets for bias evaluation, and (3) techniques for bias mitigation. We categorize, summarize, and discuss these three research areas.

Metrics for Bias Evaluation

We characterize the relationship between evaluation metrics and datasets, which are often conflated in the literature, and we categorize and discuss a wide range of metrics that can evaluate bias at different fundamental levels in a model: (1) embedding-based, which use hidden vector representations; (2) probability-based, which use model-assigned token probabilities; and (3) generated text-based, which use model-generated text continuations 

We formalize metrics mathematically with a unified notation that improves comparison between metrics. We also identify the limitations of each class of metrics to capture downstream application biases, highlighting areas for future research.

Datasets for Bias Evaluation

We categorize datasets by their data structure: (1) counterfactual inputs, or pairs of sentences with perturbed social groups, and (2) prompts or phrases to condition text generation. With this classification, we leverage our taxonomy of metrics to highlight the compatibility of datasets with new metrics beyond those originally posed. We increase comparability between dataset contents by identifying the types of harm and the social groups targeted by each dataset. We highlight consistency, reliability, and validity challenges in existing evaluation datasets as areas for improvement. Finally, we consolidate and share publicly available datasets here: https://github.com/i-gallegos/Fair-LLM-Benchmark

Techniques for Bias Mitigation

We classify an extensive range of bias mitigation methods by their intervention stage: (1) pre-processing, which modifies model inputs; (2) in-training, which modifies model parameters via gradient-based updates; (3) intra-processing, which modifies inference behavior without further training; and (4) post-processing, which modifies model outputs. We construct granular subcategories at each mitigation stage to draw similarities and trends between classes of methods, with a mathematical formalization of several techniques with unified notation and representative examples of each class of method. We draw attention to ways that bias may persist at each mitigation stage.

Open Problems and Challenges

The work we survey makes important progress in understanding and reducing bias, but several challenges remain largely open. We challenge future research to address power imbalances in LLM development, conceptualize fairness more robustly for NLP, improve bias evaluation principles and standards, expand mitigation efforts, and explore theoretical limits for fairness guarantees. 

Between the lines

As LLMs are increasingly deployed and adapted in various applications, bias evaluation and mitigation efforts remain critical research areas to ensure social harms are not automated nor perpetuated by technical systems. However, the role of technical solutions must be contextualized in a broader understanding of historical, structural, and institutional power hierarchies. For instance, who holds power in developing and deploying LLM systems, who is excluded, and how does technical solutionism preserve, enable, and strengthen inequality? We hope our work improves understanding of technical efforts to measure and reduce the perpetuation of bias by LLMs while also challenging researchers to interrogate more deeply the social, cultural, historical, and political contexts that shape the underlying assumptions and values engrained in technical solutions.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

馃攳 SEARCH

Spotlight

ALL IN Conference 2025: Four Key Takeaways from Montreal

Beyond Dependency: The Hidden Risk of Social Comparison in Chatbot Companionship

AI Policy Corner: Restriction vs. Regulation: Comparing State Approaches to AI Mental Health Legislation

Beyond Consultation: Building Inclusive AI Governance for Canada’s Democratic Future

AI Policy Corner: U.S. Executive Order on Advancing AI Education for American Youth

related posts

  • Explaining the Principles to Practices Gap in AI

    Explaining the Principles to Practices Gap in AI

  • The Future of Teaching Tech Ethics

    The Future of Teaching Tech Ethics

  • Public Strategies for Artificial Intelligence: Which Value Drivers?

    Public Strategies for Artificial Intelligence: Which Value Drivers?

  • Artificial Intelligence and the Privacy Paradox of Opportunity, Big Data and The Digital Universe

    Artificial Intelligence and the Privacy Paradox of Opportunity, Big Data and The Digital Universe

  • From Case Law to Code: Evaluating AI鈥檚 Role in the Justice System

    From Case Law to Code: Evaluating AI鈥檚 Role in the Justice System

  • A hunt for the Snark: Annotator Diversity in Data Practices

    A hunt for the Snark: Annotator Diversity in Data Practices

  • Moral Zombies: Why Algorithms Are Not Moral Agents

    Moral Zombies: Why Algorithms Are Not Moral Agents

  • Technology on the Margins: AI and Global Migration Management From a Human Rights Perspective (Resea...

    Technology on the Margins: AI and Global Migration Management From a Human Rights Perspective (Resea...

  • Social Robots and Empathy: The Harmful Effects of Always Getting What We Want

    Social Robots and Empathy: The Harmful Effects of Always Getting What We Want

  • On the Impact of Machine Learning Randomness on Group Fairness

    On the Impact of Machine Learning Randomness on Group Fairness

Partners

  • 聽
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • 漏 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.