• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • šŸ‡«šŸ‡·
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning

December 6, 2022

šŸ”¬ Research Summary by Siobhan Mackenzie Hall, PhD student at the Oxford Neural Interfacing groups at the University of Oxford. Siobhan is also a member of the Oxford Artificial Intelligence Society, along with the authors who contributed to the development of the work described below.

[Original paper by Hugo Berg, Siobhan Mackenzie Hall, Yash Bhalgat, Wonsuk Yang, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain]


Overview: Large-scale vision-language models are becoming more pervasive in society. This is concerning given the evidence of the societal biases manifesting in these models and the potential for these biases to be ingrained in society’s perceptions forever – hindering the natural process of norms being challenged and overturned. In this study, we successfully developed efficient but cheap computational methods for debiasing these out-of-the-box models while maintaining performance. This shows promise for empowering individuals to combat these injustices.Ā 


Introduction

Our new study, conducted under the University of Oxford’s Artificial Intelligence Society, has revealed societal biases in AI models designed to match text captions to images, like those used in search engines. These ā€˜vision-language’ AI models, like OpenAI’s CLIP, are trained on hundreds of millions of captioned images from the internet. Training models on this scale can take several months, cost upwards of $1M, and incur a hefty carbon footprint. The process of uploading images to the internet and choosing words to caption them does not happen in a societal vacuum. If these large-scale, internet-scraped datasets embed societal biases or stereotypical patterns, then negative portrayals can be ā€˜frozen in time’ in data. AI models built from the data can then reflect, entrench or amplify unjust representations of marginalized groups in society. This has been clearly demonstrated, for example, in OpenAI’s own assessment of CLIP’s ingrained biases. This entrenchment is likely to become more evident and pervasive with the release of generative models such as Stable Diffusion, which heavily leans on the use of CLIP, for dataset filtering and certain model components. 

This study confirms biases exist in large multimodal AI models trained on web data, especially on the axis of gender and race, and presents a cheap and efficient method that developers can employ at home to debias out-of-the-box pretrained vision-language models.

Key Insights

[This study has been peer-reviewed and will be published at The 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (AACL-IJCNLP)]

Findings and the implications thereof 

As discussed above, these societal biases are becoming entrenched, limiting the potential for societal norms to be challenged and changed. These lasting implications should be cause for alarm and become even more concerning when considering seemingly benign applications such as search engine queries. These are best explained in the context of representational and allocational harms. Representational harms refer to the over-representation of, for example, one gender when querying for a profession (e.g., ā€œnurseā€ versus ā€œdoctorā€) or one ethnicity in explicit and NSFW content (Birhane et al., 2021). These impact our perceptions and personal biases of people in different contexts, which can lead to unfair treatment – manifesting as allocational harm. Allocational harms arise when an individual’s or group’s access to resources and opportunity are differentially impacted (Weidinger et al., 2021); for instance, if the ordering of images in search results shifts recruiters’ perceptions about the real-world suitability of different peoples for different jobs.

How are these biases addressed without impacting the V+L model’s performance?

To tackle these harmful biases and preemptively counteract their harms, we propose a cheap yet effective method of debiasing off-the-shelf pretrained models (such as CLIP). In other words, we manipulate how the model links the faces of different people with captions to ensure a more equal variation of returned images. This is achieved through a two-pronged approach of adversarial debiasing with joint training — a method that introduces a second model into the process, whose goal is to prevent the first model from using sensitive demographic features when making choices over relevant captions. Via simultaneously training the model for demographic fairness and its original task of image-caption matching (joint training), our proposed solution is better at reducing bias and increasing diversity in model outputs while maintaining its performance. The contribution of this method lies in its ability to post-hoc debias trained models. This is crucial, as few machine learning practitioners have access to the data and resources needed to recreate such models from scratch. Our methods ensure you don’t need a million dollars to make a difference.

Key takeaways

  1. Large-scale models trained on web data are at risk of not only perpetuating societal biases but entrenching these through using data that can become ā€˜frozen in time’. These issues are pervasive, as models share capabilities such as filtering of datasets, thus further entrenching biases on a larger, global scale.
  2. To combat the prohibitively expensive requirements to retrain large-scale V+L models, we propose a method of debiasing that also maintains model accuracy so that the models can perform downstream tasks to the same level.
  3. These methods use adversarial debiasing (i.e., adding a second model that tries to infer demographic features from the first model’s outputs and training the two models ā€œagainstā€ each other). All code is available on GitHub (https://github.com/oxai/debias-vision-lang).

A common technical barrier to debiasing methods is that they come at the expense of accuracy and performance on downstream tasks. As authors evaluated this trade-off and found that using joint training (adding an image classification loss to the adversarial training), models can be debiased and high-performing on downstream image-captioning tasks.

Between the lines

This study is important as it empowers individuals to tackle these biases and not accept the out-of-the-box biases baked into the large-scale pre-trained models. However, more must be done to improve the pipelines in which these models are trained. The data curation standards need to be called into question as well. To summarise, lead author, Hugo Berg, argues for a needed shift in priorities: ā€œwe, as individuals, researchers, and society, need to reconsider how we evaluate AI models moving forward because both accuracy and fairness are desirable qualities.ā€ While the AI research community has directed growing attention toward issues of bias, fairness, and transparency in recent years, more work is to be done in finding methods for AI models and datasets to encode non-harmful worldviews and, indeed, what those values look like, as well as finding practical methods to realize that vision.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

šŸ” SEARCH

Spotlight

AI Policy Corner: Singapore’s National AI Strategy 2.0

AI Governance in a Competitive World: Balancing Innovation, Regulation and Ethics | Point Zero Forum 2025

AI Policy Corner: Frontier AI Safety Commitments, AI Seoul Summit 2024

AI Policy Corner: The Colorado State Deepfakes Act

Special Edition: Honouring the Legacy of Abhishek Gupta (1992–2024)

related posts

  • Rethinking Gaming: The Ethical Work of Optimization in Web Search Engines (Research Summary)

    Rethinking Gaming: The Ethical Work of Optimization in Web Search Engines (Research Summary)

  • Quantifying the Carbon Emissions of Machine Learning

    Quantifying the Carbon Emissions of Machine Learning

  • Research summary: Lexicon of Lies: Terms for Problematic Information

    Research summary: Lexicon of Lies: Terms for Problematic Information

  • Extensible Consent Management Architectures for Data Trusts

    Extensible Consent Management Architectures for Data Trusts

  • LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Games

    LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Games

  • An Algorithmic Introduction to Savings Circles

    An Algorithmic Introduction to Savings Circles

  • Target specification bias, counterfactual prediction, and algorithmic fairness in healthcare

    Target specification bias, counterfactual prediction, and algorithmic fairness in healthcare

  • Moral consideration of nonhumans in the ethics of artificial intelligence

    Moral consideration of nonhumans in the ethics of artificial intelligence

  • Ghosting the Machine: Judicial Resistance to a Recidivism Risk Assessment Instrument

    Ghosting the Machine: Judicial Resistance to a Recidivism Risk Assessment Instrument

  • Assessing the nature of large language models: A caution against anthropocentrism

    Assessing the nature of large language models: A caution against anthropocentrism

Partners

  • Ā 
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • Ā© MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.