• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • šŸ‡«šŸ‡·
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning

December 6, 2022

šŸ”¬ Research Summary by Siobhan Mackenzie Hall, PhD student at the Oxford Neural Interfacing groups at the University of Oxford. Siobhan is also a member of the Oxford Artificial Intelligence Society, along with the authors who contributed to the development of the work described below.

[Original paper by Hugo Berg, Siobhan Mackenzie Hall, Yash Bhalgat, Wonsuk Yang, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain]


Overview: Large-scale vision-language models are becoming more pervasive in society. This is concerning given the evidence of the societal biases manifesting in these models and the potential for these biases to be ingrained in society’s perceptions forever – hindering the natural process of norms being challenged and overturned. In this study, we successfully developed efficient but cheap computational methods for debiasing these out-of-the-box models while maintaining performance. This shows promise for empowering individuals to combat these injustices.Ā 


Introduction

Our new study, conducted under the University of Oxford’s Artificial Intelligence Society, has revealed societal biases in AI models designed to match text captions to images, like those used in search engines. These ā€˜vision-language’ AI models, like OpenAI’s CLIP, are trained on hundreds of millions of captioned images from the internet. Training models on this scale can take several months, cost upwards of $1M, and incur a hefty carbon footprint. The process of uploading images to the internet and choosing words to caption them does not happen in a societal vacuum. If these large-scale, internet-scraped datasets embed societal biases or stereotypical patterns, then negative portrayals can be ā€˜frozen in time’ in data. AI models built from the data can then reflect, entrench or amplify unjust representations of marginalized groups in society. This has been clearly demonstrated, for example, in OpenAI’s own assessment of CLIP’s ingrained biases. This entrenchment is likely to become more evident and pervasive with the release of generative models such as Stable Diffusion, which heavily leans on the use of CLIP, for dataset filtering and certain model components. 

This study confirms biases exist in large multimodal AI models trained on web data, especially on the axis of gender and race, and presents a cheap and efficient method that developers can employ at home to debias out-of-the-box pretrained vision-language models.

Key Insights

[This study has been peer-reviewed and will be published at The 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (AACL-IJCNLP)]

Findings and the implications thereof 

As discussed above, these societal biases are becoming entrenched, limiting the potential for societal norms to be challenged and changed. These lasting implications should be cause for alarm and become even more concerning when considering seemingly benign applications such as search engine queries. These are best explained in the context of representational and allocational harms. Representational harms refer to the over-representation of, for example, one gender when querying for a profession (e.g., ā€œnurseā€ versus ā€œdoctorā€) or one ethnicity in explicit and NSFW content (Birhane et al., 2021). These impact our perceptions and personal biases of people in different contexts, which can lead to unfair treatment – manifesting as allocational harm. Allocational harms arise when an individual’s or group’s access to resources and opportunity are differentially impacted (Weidinger et al., 2021); for instance, if the ordering of images in search results shifts recruiters’ perceptions about the real-world suitability of different peoples for different jobs.

How are these biases addressed without impacting the V+L model’s performance?

To tackle these harmful biases and preemptively counteract their harms, we propose a cheap yet effective method of debiasing off-the-shelf pretrained models (such as CLIP). In other words, we manipulate how the model links the faces of different people with captions to ensure a more equal variation of returned images. This is achieved through a two-pronged approach of adversarial debiasing with joint training — a method that introduces a second model into the process, whose goal is to prevent the first model from using sensitive demographic features when making choices over relevant captions. Via simultaneously training the model for demographic fairness and its original task of image-caption matching (joint training), our proposed solution is better at reducing bias and increasing diversity in model outputs while maintaining its performance. The contribution of this method lies in its ability to post-hoc debias trained models. This is crucial, as few machine learning practitioners have access to the data and resources needed to recreate such models from scratch. Our methods ensure you don’t need a million dollars to make a difference.

Key takeaways

  1. Large-scale models trained on web data are at risk of not only perpetuating societal biases but entrenching these through using data that can become ā€˜frozen in time’. These issues are pervasive, as models share capabilities such as filtering of datasets, thus further entrenching biases on a larger, global scale.
  2. To combat the prohibitively expensive requirements to retrain large-scale V+L models, we propose a method of debiasing that also maintains model accuracy so that the models can perform downstream tasks to the same level.
  3. These methods use adversarial debiasing (i.e., adding a second model that tries to infer demographic features from the first model’s outputs and training the two models ā€œagainstā€ each other). All code is available on GitHub (https://github.com/oxai/debias-vision-lang).

A common technical barrier to debiasing methods is that they come at the expense of accuracy and performance on downstream tasks. As authors evaluated this trade-off and found that using joint training (adding an image classification loss to the adversarial training), models can be debiased and high-performing on downstream image-captioning tasks.

Between the lines

This study is important as it empowers individuals to tackle these biases and not accept the out-of-the-box biases baked into the large-scale pre-trained models. However, more must be done to improve the pipelines in which these models are trained. The data curation standards need to be called into question as well. To summarise, lead author, Hugo Berg, argues for a needed shift in priorities: ā€œwe, as individuals, researchers, and society, need to reconsider how we evaluate AI models moving forward because both accuracy and fairness are desirable qualities.ā€ While the AI research community has directed growing attention toward issues of bias, fairness, and transparency in recent years, more work is to be done in finding methods for AI models and datasets to encode non-harmful worldviews and, indeed, what those values look like, as well as finding practical methods to realize that vision.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

šŸ” SEARCH

Spotlight

Beyond Consultation: Building Inclusive AI Governance for Canada’s Democratic Future

AI Policy Corner: U.S. Executive Order on Advancing AI Education for American Youth

AI Policy Corner: U.S. Copyright Guidance on Works Created with AI

AI Policy Corner: AI for Good Summit 2025

AI Policy Corner: Japan’s AI Promotion Act

related posts

  • Prediction Sensitivity: Continual Audit of Counterfactual Fairness in Deployed Classifiers

    Prediction Sensitivity: Continual Audit of Counterfactual Fairness in Deployed Classifiers

  • Deployment corrections: An incident response framework for frontier AI models

    Deployment corrections: An incident response framework for frontier AI models

  • Bridging Systems: Open Problems for Countering Destructive Divisiveness Across Ranking, Recommenders...

    Bridging Systems: Open Problems for Countering Destructive Divisiveness Across Ranking, Recommenders...

  • Rise of the machines: Prof Stuart Russell on the promises and perils of AI

    Rise of the machines: Prof Stuart Russell on the promises and perils of AI

  • The State of Artificial Intelligence in the Pacific Islands

    The State of Artificial Intelligence in the Pacific Islands

  • SoK: The Gap Between Data Rights Ideals and Reality

    SoK: The Gap Between Data Rights Ideals and Reality

  • Looking before we leap: Expanding ethical review processes for AI and data science research

    Looking before we leap: Expanding ethical review processes for AI and data science research

  • From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting

    From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting

  • Attacking Fake News Detectors via Manipulating News Social Engagement

    Attacking Fake News Detectors via Manipulating News Social Engagement

  • Data Capitalism and the User: An Exploration of Privacy Cynicism in Germany

    Data Capitalism and the User: An Exploration of Privacy Cynicism in Germany

Partners

  • Ā 
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • Ā© 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.