• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Prediction Sensitivity: Continual Audit of Counterfactual Fairness in Deployed Classifiers

June 2, 2022

🔬 Research Summary by Ivoline C. Ngong, Krystal Maughan, Joseph P. Near. Ivoline is a first-year doctoral student in Computer Science at the University of Vermont whose research interests lie at the intersection of data privacy, fairness, and machine learning. Krystal is a third-year doctoral student in Computer Science at the University of Vermont whose current work focuses on Mathematical cryptography, specifically Isogeny-based cryptography. She has also published in fairness and privacy / Trustworthy AI. Joseph is an assistant professor of computer science at the University of Vermont. His research interests include security, privacy, machine learning, and programming languages.

[Original paper by Krystal Maughan, Ivoline C. Ngong, Joseph P. Near]


Overview: In a world where AI-based systems dominate many parts of our daily lives, auditing these systems for fairness represents a growing concern. Despite being used to audit these systems, group fairness metrics do not always uncover discrimination against individuals and they are difficult to apply once the system has been deployed. Counterfactual fairness describes an individual notion of fairness, which is particularly difficult to evaluate after deployment. In this paper, we present prediction sensitivity, an approach for auditing counterfactual fairness while the model is deployed and making predictions.


Introduction

The Challenge: Biased Predictions in Machine Learning. AI-based decision-making systems play an increasing role in a wide range of high-stakes decisions, prompting the question of whether they can lead to unfair outcomes. To address these legitimate concerns, several metrics have been proposed to audit these classifiers, measure and mitigate bias before they are deployed. Despite group fairness metrics like disparate impact and equalized odds being useful at uncovering discrimination in these classifiers, they fail to catch discrimination against individuals. Secondly, most existing techniques focus on auditing or mitigating for bias before these classifiers are deployed. There has been comparatively less research on methods to ensure that trained classifiers continue to perform well after deployment, even though novel examples encountered after deployment are likely to reveal bias in trained classifiers. 

Our Approach: Measuring Counterfactual Fairness using Prediction Sensitivity. To address these problems, we propose an approach for continuous audit of counterfactual fairness in deployed classifiers called prediction sensitivity. Counterfactual fairness requires that a fair classifier would have made the same prediction if the individual had belonged to a different demographic group – hence achieving individual fairness. Prediction sensitivity attempts to measure counterfactual fairness: it is high when a prediction would have been different, if the individual had belonged to a different demographic group.
Toward Continual Audit of Deployed Models. Prediction sensitivity is designed to be deployed alongside the trained classifier, in order to raise an alarm before a potentially unfair prediction is made. This is an important context because deployed classifiers may encounter novel examples not considered by the classifier’s designers, which may trigger discriminatory behavior that is impossible to detect at training time.

Key Insights

Prediction Sensitivity

Prediction sensitivity is a measure of how changes in the input would influence changes in a model’s prediction, weighted by each feature’s influence on the individual’s protected status. The intuition behind this measure is that a model that is particularly sensitive to protected attributes – such as race or gender – is likely to violate counterfactual fairness.

Prediction sensitivity is the dot product of two gradients: the “protected status feature weights” and the “per-feature influence” (as shown in the figure below).

The protected status feature weights measure how the features of the input x contribute to its protected status. The weights are determined by training the Protected status model A to predict x’s protected status and computing the gradient using A(x). Consequently, if A captures the correlation between input features and protected status, then these weights represent the strength of each feature’s influence on the protected status.

The per-feature influence, on the other hand, captures the influence of each input feature on the classifier’s predictions. It describes how much a hypothetical change in each feature of x would affect the Trained Classifier, F’s prediction.

Detecting Violations of Counterfactual Fairness

To detect discriminatory behavior, we can use a simple procedure for each prediction the classifier makes:

  1. Calculate prediction sensitivity for the prediction
  2. If prediction sensitivity is too high (i.e. above a threshold), sound the alarm!

To show that prediction sensitivity is a useful measure of fairness, we need to answer the question: Does our procedure accurately detect discrimination in deployed classifiers? 

Unfortunately, this is an extremely tricky question to answer conclusively, for two reasons. First, it is notoriously difficult even to define fairness – numerous conflicting definitions exist, and the research community has not reached consensus about which one(s) to use. Second, even if we settle on a definition, evaluating the accuracy of prediction sensitivity requires ground truth information about both a real situation and its counterfactual (i.e. what would have been different) – which is essentially impossible in most realistic settings.

We evaluated prediction sensitivity experimentally by constructing synthetic data, which gives us control over the causal relationships in the data. In this setting, we find that prediction sensitivity does a good job detecting (constructed) violations of counterfactual fairness. We also conducted related experiments using real data with counterfactual augmentation – an attempt to construct the counterfactual situation for each data point – and obtained promising results.

Between the lines

We’re excited about our results in this paper; we think prediction sensitivity may be an important new tool to add to our toolbox for evaluating fairness in machine learning. In particular, prediction sensitivity allows monitoring classifiers after deployment, for every prediction they make.

There are, however, some limitations to the proposed method. A third party cannot audit the deployed system without access to the classifiers since prediction sensitivity can only be calculated with white box access to the classifier. So, we need to rely on organizations who develop models to deploy prediction sensitivity themselves – which they may not be incentivized to do.

Our experimental results suggest that prediction sensitivity may be effective at detecting violations of counterfactual fairness, but we had to make major assumptions about the world in order to conduct them. Are these assumptions reasonable? Answering this question is really difficult (or maybe impossible!), and it’s a major complication in recent fairness research more generally.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

ALL IN Conference 2025: Four Key Takeaways from Montreal

Beyond Dependency: The Hidden Risk of Social Comparison in Chatbot Companionship

AI Policy Corner: Restriction vs. Regulation: Comparing State Approaches to AI Mental Health Legislation

Beyond Consultation: Building Inclusive AI Governance for Canada’s Democratic Future

AI Policy Corner: U.S. Executive Order on Advancing AI Education for American Youth

related posts

  • Cascaded Debiasing : Studying the Cumulative Effect of Multiple Fairness-Enhancing Interventions

    Cascaded Debiasing : Studying the Cumulative Effect of Multiple Fairness-Enhancing Interventions

  • Judging the algorithm: A case study on the risk assessment tool for gender-based violence implemente...

    Judging the algorithm: A case study on the risk assessment tool for gender-based violence implemente...

  • RAIN Africa and MAIEI on The Future of Responsible AI in Africa (Public Consultation Summary)

    RAIN Africa and MAIEI on The Future of Responsible AI in Africa (Public Consultation Summary)

  • An Algorithmic Introduction to Savings Circles

    An Algorithmic Introduction to Savings Circles

  • Structured access to AI capabilities: an emerging paradigm for safe AI deployment

    Structured access to AI capabilities: an emerging paradigm for safe AI deployment

  • Making Kin with the Machines

    Making Kin with the Machines

  • System Safety and Artificial Intelligence

    System Safety and Artificial Intelligence

  • The Most Important Question in AI Alignment

    The Most Important Question in AI Alignment

  • Data Capitalism and the User: An Exploration of Privacy Cynicism in Germany

    Data Capitalism and the User: An Exploration of Privacy Cynicism in Germany

  • On the Challenges of Deploying Privacy-Preserving Synthetic Data in the Enterprise

    On the Challenges of Deploying Privacy-Preserving Synthetic Data in the Enterprise

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.