• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models

May 12, 2025

🔬 Research Summary by ✍️ Timm Dill

Timm is a Student Research Assistant at the Chair for Data Science, University of Hamburg.

[Original Paper by Margaret Mitchell, Giuseppe Attanasio, Ioana Baldini, Miruna Clinciu, Jordan Clive, Pieter Delobelle, Manan Dey, Sil Hamilton, Timm Dill, Jad Doughman, Ritam Dutt, Avijit Ghosh, Jessica Zosa Forde, Carolin Holtermann, Lucie-Aimée Kaffee, Tanmay Laud, Anne Lauscher, Roberto L Lopez-Davila, Maraim Masoud, Nikita Nangia, Anaelia Ovalle, Giada Pistilli, Dragomir Radev, Beatrice Savoldi, Vipul Raheja, Jeremy Qin, Esther Ploeger, Arjun Subramonian, Kaustubh Dhole, Kaiser Sun, Amirbek Djanibekov, Jonibek Mansurov, Kayo Yin, Emilio Villa Cueva, Sagnik Mukherjee, Jerry Huang, Xudong Shen, Jay Gala, Hamdan Al-Ali, Tair Djanibekov, Nurdaulet Mukhituly, Shangrui Nie, Shanya Sharma, Karolina Stanczak, Eliza Szczechla, Tiago Timponi Torrent, Deepak Tunuguntla, Marcelo Viridiano, Oskar van der Wal, Adina Yakefu, Aurélie Névéol, Mike Zhang, Sydney Zink, Zeerak Talat]

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 11995–12041, April 29 – May 4, 2025.

Image header: Figure 1: Regions with recognized stereotypes in SHADES.


Overview

SHADES represents the first comprehensive multilingual dataset explicitly aimed at evaluating stereotype propagation within large language models (LLMs) across diverse linguistic and cultural contexts. Developed collaboratively by an international consortium of researchers, SHADES compiles over 300 culturally-specific stereotypes, rigorously gathered and validated by native and fluent speakers across 16 languages and 37 geographical regions.

The primary objective of SHADES is to facilitate systematic analyses of generative language models, identifying how these models propagate stereotypes differently depending on their linguistic and cultural inputs. Historically, bias evaluations in AI have predominantly focused on English-language and Western-centric stereotypes, neglecting the complexities and subtleties present in different cultural contexts. They have also tended to be fairly limited linguistically, created using simple sentence structures that ease the automatic generation of sentences based on templates. SHADES addresses this critical shortfall by categorizing stereotypes based on linguistic form, the identities they target, and their cultural relevance. Comprehensive methodological details and the dataset itself are openly accessible, serving as essential resources for researchers, technologists, and policymakers seeking to benchmark and address the nuanced challenges of multilingual stereotype propagation.

Why It Matters 

The SHADES dataset substantially advances our understanding of biases within multilingual AI systems, directly impacting discussions around fairness, accountability, transparency, and safety within the broader context of AI ethics. As AI tools continue to be deployed globally, it is imperative to move beyond Western-centric perspectives and thoroughly investigate how stereotypes permeate multilingual AI environments.

The implications of this research extend far beyond academia. Policymakers can leverage insights from SHADES to craft more inclusive regulatory frameworks. Developers and technologists can use the dataset to inform more effective bias mitigation strategies, ultimately leading to fairer and safer AI systems. Furthermore, global communities gain critical transparency into the ways in which AI might unintentionally perpetuate harmful stereotypes. SHADES thus provides a foundational resource for ongoing global initiatives dedicated to mitigating bias within AI, which increasingly shapes public discourse, media representation, education, and societal interactions.

Between the Lines 

While the creation of SHADES marks significant progress, the dataset and associated research reveal important ethical and methodological complexities that require thoughtful consideration:

  1. Cultural Context and Ecological Validity: SHADES underscores significant inconsistencies in stereotype sensitivity across various stereotype categories and linguistic contexts. Notably, stereotypes related to nationality or regional identities are less regulated within existing AI frameworks compared to stereotypes around categories like gender. This inconsistency emphasizes the urgent need for culturally sensitive AI governance approaches that consider local nuances and the context-specific implications of stereotype propagation.
  2. Confirmation and Automation Bias Risks: The dataset highlights the potential for stereotype propagation to be amplified through confirmation bias (accepting information aligned with pre-existing beliefs) and automation bias (excessively trusting automated systems). These cognitive biases pose a risk of normalizing and embedding harmful stereotypes within society, thereby shaping public attitudes and possibly exacerbating existing societal prejudices and divisions.
  3. Equity in Data Representation and Resource Distribution: Multilingual datasets, such as SHADES, inherently risk reinforcing existing global inequities. Languages and cultures that are already underrepresented in data and resources may inadvertently be further marginalized. Addressing this issue demands a deliberate effort toward equitable representation, genuine collaboration with diverse communities, and avoidance of “data colonization,” thus ensuring the responsible and fair development of multilingual AI resources.

Moving forward

SHADES is one step in addressing technology’s tendency to reify views that disproportionately disadvantage marginalized communities. It highlights many further opportunities to improve technology for everyone, including culturally sensitive data collection and new methods for analysis. Ultimately, by bringing together people from diverse cultures and languages to participate in defining what AI technology should be, we can create technology that better serves the needs of users throughout the world.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

ALL IN Conference 2025: Four Key Takeaways from Montreal

Beyond Dependency: The Hidden Risk of Social Comparison in Chatbot Companionship

AI Policy Corner: Restriction vs. Regulation: Comparing State Approaches to AI Mental Health Legislation

Beyond Consultation: Building Inclusive AI Governance for Canada’s Democratic Future

AI Policy Corner: U.S. Executive Order on Advancing AI Education for American Youth

related posts

  • Principios éticos para una inteligencia artificial antropocéntrica: consensos actuales desde una per...

    Principios éticos para una inteligencia artificial antropocéntrica: consensos actuales desde una per...

  • Ethics for People Who Work in Tech

    Ethics for People Who Work in Tech

  • Friend or foe? Exploring the implications of large language models on the science system

    Friend or foe? Exploring the implications of large language models on the science system

  • Research summary: Algorithmic Colonization of Africa

    Research summary: Algorithmic Colonization of Africa

  • Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow

    Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow

  • Data Pooling in Capital Markets and its Implications

    Data Pooling in Capital Markets and its Implications

  • Beyond Empirical Windowing: An Attention-Based Approach for Trust Prediction in Autonomous Vehicles

    Beyond Empirical Windowing: An Attention-Based Approach for Trust Prediction in Autonomous Vehicles

  • Artificial Intelligence and the Privacy Paradox of Opportunity, Big Data and The Digital Universe

    Artificial Intelligence and the Privacy Paradox of Opportunity, Big Data and The Digital Universe

  • The Bias of Harmful Label Associations in Vision-Language Models

    The Bias of Harmful Label Associations in Vision-Language Models

  • Research summary: Troubling Trends in Machine Learning Scholarship

    Research summary: Troubling Trends in Machine Learning Scholarship

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.