• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
    • Tech Futures
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models

September 15, 2023

🔬 Research Summary by Shangbin Feng, Chan Young Park, and Yulia Tsvetkov.

Shangbin Feng is a Ph.D. student at University of Washington.
Chan Young Park is a Ph.D. student at Carnegie Mellon University, studying the intersection of computational social science and NLP.
Yulia Tsvetkov is an associate professor at University of Washington.

[Original paper by Shangbin Feng, Chan Young Park, Yuhan Liu, Yulia Tsvetkov]


Overview: This paper studies the political biases of large language models and their impact on fairness in downstream tasks and NLP applications. The authors propose an end-to-end framework, investigating how political opinions in language model training data propagate to language models and then to biased predictions in critical downstream tasks such as hate speech and misinformation detection. This paper uniquely finds that no language model will be entirely free from political biases. Thus, how to mitigate the political biases and unfairness of large language models is of critical importance. 


Introduction

Language models (LMs) are pretrained on diverse data sources—news, discussion forums, books, and online encyclopedias. A significant portion of this data includes facts and opinions which, on the one hand, celebrate democracy and diversity of ideas and, on the other hand, are inherently socially biased. Our work develops new methods to (1) measure media biases in LMs trained on such corpora along social and economic axes and (2) measure the fairness of downstream NLP models trained on top of politically biased LMs. We focus on hate speech and misinformation detection, aiming to empirically quantify the effects of political (social, economic) biases in pretraining data on the fairness of high-stakes, social-oriented tasks. Our findings reveal that pretrained LMs have political leanings that reinforce the polarization present in pretraining corpora, propagating social biases into hate speech predictions and media biases into misinformation detectors. We discuss the implications of our findings for NLP research and propose future directions to mitigate unfairness.

Key Insights

Motivation and Methodology

It is well established in NLP and machine learning research that social biases in language, expressed subtly and seemingly benignly, propagate and are even amplified in user-facing NLP systems that are trained using the data from the internet. In recent years, many studies have highlighted the risks of such biases. However, we noticed two important gaps in existing work. First, studies that highlight the risks of bias in NLP models often use synthetic data, such as explicitly toxic or prejudiced comments, which are less common on the Web and are relatively easy to detect and automatically filter out from the training data of language models. Second, studies that propose interventions and bias mitigation approaches often focus on individual components of the machine learning pipeline rather than understanding bias propagation end-to-end, from Web data to large language models like ChatGPT and then to user-facing technologies that are built upon language models. We wanted to focus on a more realistic setting, with common examples of biases in language, and on a more holistic, end-to-end analysis of potential harms.     

This is why we decided to focus on potential biases in political discussions because discussions about polarizing social and economic issues are abundant in pretraining data sourced from news, forums, books, and online encyclopedias, and this language inevitably perpetuates social stereotypes. As an example of more realistic user-facing systems built upon politically biased language models, we focused on high-stakes social-oriented tasks, such as hate speech and misinformation detection. We wanted to understand whether valid and valuable political discussions, possibly about polarizing issues—climate change, gun control, abortion, wage gaps, taxes, same-sex marriage, and more—that represent the diversity and plurality of opinions and cannot be simply filtered from the training data can lead to unfair decisions in hate speech detection and misinformation detection.

Findings and Impact

We show the study’s implications in evaluating the fairness of downstream NLP models.  Our experiments demonstrate, across several data domains (Reddit, news outlets), partisan news datasets, and language model architectures, that different pretrained language models have different underlying political leanings, capturing the political polarization in the data. While the overall performance (e.g., classification accuracy) of hate speech and misinformation detectors remains relatively consistent across such politically biased language models, these models exhibit significantly different behaviors in hate speech detection against different identity groups and social attributes, such as gender, race, ethnicity, religion, and sexual orientation, and significantly biased decisions in misinformation detection with respect to political leanings of source media. 

Importantly, our work highlights that pernicious biases and unfairness in NLP tasks can be caused by non-toxic and non-malicious data. This creates a dilemma because political opinions cannot be filtered from training data. This would lead to censorship and exclusion from political participation. However, keeping them in training data inevitably leads to unfairness in NLP models. Ultimately, this means that no language model can be entirely free from social biases and underscores the need to find new technical and policy approaches to deal with model unfairness.

Between the lines

An important conclusion from our research is that to develop ethical and equitable technologies, we need to look at more realistic data and realistic scenarios. There’s no fairness without awareness. This would require us to consider the full complexity of language (including understanding people’s intents and presuppositions) to ground our research in social science, policy research, in philosophy and develop more interesting, advanced technical approaches to machine learning model controllability and interpretability. In the paper, we discuss several concrete ideas for future research. Ultimately, we hope our findings will inspire more interesting, interdisciplinary research in computational ethics and social science.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

A rock embedded with intricate circuit board patterns, held delicately by pale hands drawn in a ghostly style. The contrast between the rough, metallic mineral and the sleek, artificial circuit board illustrates the relationship between raw natural resources and modern technological development. The hands evoke human involvement in the extraction and manufacturing processes.

Tech Futures: The Fossil Fuels Playbook for Big Tech: Part I

Close-up of a cat sleeping on a computer keyboard

Tech Futures: The threat of AI-generated code to the world’s digital infrastructure

The undying sun hangs in the sky, as people gather around signal towers, working through their digital devices.

Dreams and Realities in Modi’s AI Impact Summit

Illustration of a coral reef ecosystem

Tech Futures: Diversity of Thought and Experience: The UN’s Scientific Panel on AI

This image shows a large white, traditional, old building. The top half of the building represents the humanities (which is symbolised by the embedded text from classic literature which is faintly shown ontop the building). The bottom section of the building is embossed with mathematical formulas to represent the sciences. The middle layer of the image is heavily pixelated. On the steps at the front of the building there is a group of scholars, wearing formal suits and tie attire, who are standing around at the enternace talking and some of them are sitting on the steps. There are two stone, statute-like hands that are stretching the building apart from the left side. In the forefront of the image, there are 8 students - which can only be seen from the back. Their graduation gowns have bright blue hoods and they all look as though they are walking towards the old building which is in the background at a distance. There are a mix of students in the foreground.

Tech Futures: Co-opting Research and Education

related posts

  • From Dance App to Political Mercenary: How disinformation on TikTok gaslights political tensions in ...

    From Dance App to Political Mercenary: How disinformation on TikTok gaslights political tensions in ...

  • Harmonizing Artificial Intelligence: The role of standards in the EU AI Regulation

    Harmonizing Artificial Intelligence: The role of standards in the EU AI Regulation

  • A Generalist Agent

    A Generalist Agent

  • Bridging the Gap: Addressing the Legislative Gap Surrounding Non-Consensual Deepfakes

    Bridging the Gap: Addressing the Legislative Gap Surrounding Non-Consensual Deepfakes

  • Research summary: Digital Abundance and Scarce Genius: Implications for Wages, Interest Rates, and G...

    Research summary: Digital Abundance and Scarce Genius: Implications for Wages, Interest Rates, and G...

  • Challenges of AI Development in Vietnam: Funding, Talent and Ethics

    Challenges of AI Development in Vietnam: Funding, Talent and Ethics

  • Eticas Foundation external audits VioGĂ©n: Spain’s algorithm designed to protect victims of gender vi...

    Eticas Foundation external audits VioGén: Spain’s algorithm designed to protect victims of gender vi...

  • Research summary: Roles for Computing in Social Change

    Research summary: Roles for Computing in Social Change

  • Judging the algorithm: A case study on the risk assessment tool for gender-based violence implemente...

    Judging the algorithm: A case study on the risk assessment tool for gender-based violence implemente...

  • Predatory Medicine: Exploring and Measuring the Vulnerability of Medical AI to Predatory Science

    Predatory Medicine: Exploring and Measuring the Vulnerability of Medical AI to Predatory Science

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.