• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Measuring Fairness of Text Classifiers via Prediction Sensitivity

June 17, 2022

🔬 Research Summary by Satyapriya Krishna, a PhD student at Harvard University working on problems related to Trustworthy Machine Learning.

[Original paper by Satyapriya Krishna, Rahul Gupta, Apurv Verma, Jwala Dhamala, Yada Pruksachatkun, Kai-Wei Chang]


Overview: With the rapid growth in language processing applications, fairness has emerged as an important consideration in data-driven solutions. Although various fairness definitions have been explored in the recent literature, there is a lack of consensus on which metrics most accurately reflect the fairness of a system. This paper introduces a new formulation – Accumulated Prediction Sensitivity, which measures fairness in machine learning models based on the model’s prediction sensitivity to perturbations in input features. The metric attempts to quantify the extent to which a single prediction depends on a protected attribute, where the protected attribute encodes the membership status of an individual in a protected group. It is observed that the proposed fairness metric based on prediction sensitivity is significantly more correlated with human annotation than the existing counterfactual fairness metric.


Introduction

Ongoing research is increasingly emphasizing the development of methods that detect and mitigate unfair social bias present in machine learning-based language processing models. These methods come under the umbrella of algorithmic fairness which has been quantitatively expressed with numerous definitions. These fairness definitions are broadly categorized into two types, i.e, individual fairness and group fairness. Individual fairness is aimed at evaluating whether a model gives similar predictions for individuals with similar personal attributes (e.g., age or race). On the other hand, group fairness evaluates fairness across cohorts with the same protected attributes instead of individuals. Although these two broad categories of fairness define valid notions of fairness, human understanding of fairness is also used to measure fairness in machine learning models. Existing studies often consider only one or two of these verticals, providing an incomplete picture of fairness in model predictions. In order to mitigate this problem, the authors propose a formulation based on models sensitivity to input features – the accumulated prediction sensitivity, to measure the fairness of model predictions, and establish its theoretical relationship with statistical parity (group fairness) and individual fairness metrics. They also empirically demonstrate the correlation between the proposed metric and human perception of fairness, hence, providing a much stronger comprehensive fairness metric.

Key Insights

Accumulated Prediction Sensitivity

Accumulated Prediction Sensitivity is defined as a metric that captures the sensitivity of a model to protected attributes such as gender, race, etc in learning tasks associated with language processing. In a nutshell, this metric is the amalgamation of three major components, i.e, (1) prediction sensitivity of the model with respect to input, (2) aggregation of sensitivity with respect to protected attributes, and (3) aggregation of sensitivity over a prediction class. This combination essentially drives the metric to enhance contributions made by protected features such as race, gender, etc in the model’s decision-making process.  Based on this notion, the accumulated prediction sensitivity score is expected to be smaller for fair models. 

Relation with Group Fairness

In a nutshell, group fairness ensures model outcome is independent of the protected features, which is also known as statistical parity. The proposed metric, Accumulated Prediction Sensitivity, aligns with this definition of group fairness and is proven by observing the expected score of zero for the case of perfect statistical parity. It is also empirically evident where authors show that if the modeler unintentionally uses the correlated feature, for instance “hair length” for “gender” (protected attribute) ,  while attempting to build a classifier with statistical parity, the proposed metric can be used for evaluation.

Relation with Individual Fairness

The notion of individual based fairness is stated as [Ref Dwork 2012]: “We interpret the goal of mapping similar people similarly to mean that the distributions assigned to similar people are similar”. This constraint is applied to the model training process by ensuring the model outcome follows the Lipschitz property with respect to some distance metric measuring distance between two samples in the population. Accumulated Prediction Sensitivity respects this fairness definition and the article shows this property by proving that the metric is bounded by the Lipschitz constant defined in the individual fairness constraint. This is further strengthened by the fact that increasing the Lipschitz constant would ease the fairness constraint resulting in a higher magnitude of the proposed metric, which is seen in experiment results as well.

Relation with Human Perception of Fairness

This metric is further tested with a user survey to validate its alignment with human perception of fairness. As part of the study, a group of annotators were requested to evaluate the model prediction and assess whether they believe the output is biased. For instance, given the social/cultural norms, a profession classifier assigning a data-point “she worked in a hospital” to nurse instead of doctor can be perceived as biased. The results of this study were then used to compute correlation with the Accumulated Prediction Sensitivity for multiple text classification datasets. Results from this study suggest that the proposed metric is significantly more correlated against the existing metric based on counterfactual examples.

Between the lines

Evaluating fairness is a challenging task as it requires selecting a specific notion of fairness (e.g. group or individual fairness) and then identifying metrics that can capture these notions of fairness while evaluating a predictor. Additionally, certain notions of fairness may not be well defined and can change based upon social norms (e.g. “volleyball” being closely associated with females); that may seep into the dataset at hand. The authors defined a metric that aligns with all the three common verticals of fairness metrics: group, individual, and human perception of fairness. 

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

This image shows a large white, traditional, old building. The top half of the building represents the humanities (which is symbolised by the embedded text from classic literature which is faintly shown ontop the building). The bottom section of the building is embossed with mathematical formulas to represent the sciences. The middle layer of the image is heavily pixelated. On the steps at the front of the building there is a group of scholars, wearing formal suits and tie attire, who are standing around at the enternace talking and some of them are sitting on the steps. There are two stone, statute-like hands that are stretching the building apart from the left side. In the forefront of the image, there are 8 students - which can only be seen from the back. Their graduation gowns have bright blue hoods and they all look as though they are walking towards the old building which is in the background at a distance. There are a mix of students in the foreground.

Tech Futures: Co-opting Research and Education

Agentic AI systems and algorithmic accountability: a new era of e-commerce

ALL IN Conference 2025: Four Key Takeaways from Montreal

Beyond Dependency: The Hidden Risk of Social Comparison in Chatbot Companionship

AI Policy Corner: Restriction vs. Regulation: Comparing State Approaches to AI Mental Health Legislation

related posts

  • The State of AI Ethics Report (Volume 5)

    The State of AI Ethics Report (Volume 5)

  • “Made by Humans” Still Matters

    “Made by Humans” Still Matters

  • The AI Carbon Footprint and Responsibilities of AI Scientists

    The AI Carbon Footprint and Responsibilities of AI Scientists

  • The Ethical AI Startup Ecosystem 04: Targeted AI Solutions and Technologies

    The Ethical AI Startup Ecosystem 04: Targeted AI Solutions and Technologies

  • Defining a Research Testbed for Manned-Unmanned Teaming Research

    Defining a Research Testbed for Manned-Unmanned Teaming Research

  • How Culturally Aligned are Large Language Models?

    How Culturally Aligned are Large Language Models?

  • Research summary: Algorithmic Injustices towards a Relational Ethics

    Research summary: Algorithmic Injustices towards a Relational Ethics

  • De-platforming disinformation: conspiracy theories and their control

    De-platforming disinformation: conspiracy theories and their control

  • Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical ...

    Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical ...

  • Data Pooling in Capital Markets and its Implications

    Data Pooling in Capital Markets and its Implications

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.