• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

Measuring Fairness of Text Classifiers via Prediction Sensitivity

June 17, 2022

🔬 Research Summary by Satyapriya Krishna, a PhD student at Harvard University working on problems related to Trustworthy Machine Learning.

[Original paper by Satyapriya Krishna, Rahul Gupta, Apurv Verma, Jwala Dhamala, Yada Pruksachatkun, Kai-Wei Chang]


Overview: With the rapid growth in language processing applications, fairness has emerged as an important consideration in data-driven solutions. Although various fairness definitions have been explored in the recent literature, there is a lack of consensus on which metrics most accurately reflect the fairness of a system. This paper introduces a new formulation – Accumulated Prediction Sensitivity, which measures fairness in machine learning models based on the model’s prediction sensitivity to perturbations in input features. The metric attempts to quantify the extent to which a single prediction depends on a protected attribute, where the protected attribute encodes the membership status of an individual in a protected group. It is observed that the proposed fairness metric based on prediction sensitivity is significantly more correlated with human annotation than the existing counterfactual fairness metric.


Introduction

Ongoing research is increasingly emphasizing the development of methods that detect and mitigate unfair social bias present in machine learning-based language processing models. These methods come under the umbrella of algorithmic fairness which has been quantitatively expressed with numerous definitions. These fairness definitions are broadly categorized into two types, i.e, individual fairness and group fairness. Individual fairness is aimed at evaluating whether a model gives similar predictions for individuals with similar personal attributes (e.g., age or race). On the other hand, group fairness evaluates fairness across cohorts with the same protected attributes instead of individuals. Although these two broad categories of fairness define valid notions of fairness, human understanding of fairness is also used to measure fairness in machine learning models. Existing studies often consider only one or two of these verticals, providing an incomplete picture of fairness in model predictions. In order to mitigate this problem, the authors propose a formulation based on models sensitivity to input features – the accumulated prediction sensitivity, to measure the fairness of model predictions, and establish its theoretical relationship with statistical parity (group fairness) and individual fairness metrics. They also empirically demonstrate the correlation between the proposed metric and human perception of fairness, hence, providing a much stronger comprehensive fairness metric.

Key Insights

Accumulated Prediction Sensitivity

Accumulated Prediction Sensitivity is defined as a metric that captures the sensitivity of a model to protected attributes such as gender, race, etc in learning tasks associated with language processing. In a nutshell, this metric is the amalgamation of three major components, i.e, (1) prediction sensitivity of the model with respect to input, (2) aggregation of sensitivity with respect to protected attributes, and (3) aggregation of sensitivity over a prediction class. This combination essentially drives the metric to enhance contributions made by protected features such as race, gender, etc in the model’s decision-making process.  Based on this notion, the accumulated prediction sensitivity score is expected to be smaller for fair models. 

Relation with Group Fairness

In a nutshell, group fairness ensures model outcome is independent of the protected features, which is also known as statistical parity. The proposed metric, Accumulated Prediction Sensitivity, aligns with this definition of group fairness and is proven by observing the expected score of zero for the case of perfect statistical parity. It is also empirically evident where authors show that if the modeler unintentionally uses the correlated feature, for instance “hair length” for “gender” (protected attribute) ,  while attempting to build a classifier with statistical parity, the proposed metric can be used for evaluation.

Relation with Individual Fairness

The notion of individual based fairness is stated as [Ref Dwork 2012]: “We interpret the goal of mapping similar people similarly to mean that the distributions assigned to similar people are similar”. This constraint is applied to the model training process by ensuring the model outcome follows the Lipschitz property with respect to some distance metric measuring distance between two samples in the population. Accumulated Prediction Sensitivity respects this fairness definition and the article shows this property by proving that the metric is bounded by the Lipschitz constant defined in the individual fairness constraint. This is further strengthened by the fact that increasing the Lipschitz constant would ease the fairness constraint resulting in a higher magnitude of the proposed metric, which is seen in experiment results as well.

Relation with Human Perception of Fairness

This metric is further tested with a user survey to validate its alignment with human perception of fairness. As part of the study, a group of annotators were requested to evaluate the model prediction and assess whether they believe the output is biased. For instance, given the social/cultural norms, a profession classifier assigning a data-point “she worked in a hospital” to nurse instead of doctor can be perceived as biased. The results of this study were then used to compute correlation with the Accumulated Prediction Sensitivity for multiple text classification datasets. Results from this study suggest that the proposed metric is significantly more correlated against the existing metric based on counterfactual examples.

Between the lines

Evaluating fairness is a challenging task as it requires selecting a specific notion of fairness (e.g. group or individual fairness) and then identifying metrics that can capture these notions of fairness while evaluating a predictor. Additionally, certain notions of fairness may not be well defined and can change based upon social norms (e.g. “volleyball” being closely associated with females); that may seep into the dataset at hand. The authors defined a metric that aligns with all the three common verticals of fairness metrics: group, individual, and human perception of fairness. 

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

AI Policy Corner: U.S. Copyright Guidance on Works Created with AI

AI Policy Corner: AI for Good Summit 2025

AI Policy Corner: Japan’s AI Promotion Act

AI Policy Corner: Texas and New York: Comparing U.S. State-Level AI Laws

What is Sovereign Artificial Intelligence?

related posts

  • A Lesson From AI: Ethics Is Not an Imitation Game

    A Lesson From AI: Ethics Is Not an Imitation Game

  • Extensible Consent Management Architectures for Data Trusts

    Extensible Consent Management Architectures for Data Trusts

  • The E.U.’s Artificial Intelligence Act: An Ordoliberal Assessment

    The E.U.’s Artificial Intelligence Act: An Ordoliberal Assessment

  • The TESCREAL Bundle: Eugenics and the promise of utopia through artificial general intelligence

    The TESCREAL Bundle: Eugenics and the promise of utopia through artificial general intelligence

  • Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparativ...

    Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparativ...

  • Research summary: Legal Risks of Adversarial Machine Learning Research

    Research summary: Legal Risks of Adversarial Machine Learning Research

  • Supporting Human-LLM collaboration in Auditing LLMs with LLMs

    Supporting Human-LLM collaboration in Auditing LLMs with LLMs

  • The 28 Computer Vision Datasets Used in Algorithmic Fairness Research

    The 28 Computer Vision Datasets Used in Algorithmic Fairness Research

  • Unpacking Invisible Work Practices, Constraints, and Latent Power Relationships in Child Welfare thr...

    Unpacking Invisible Work Practices, Constraints, and Latent Power Relationships in Child Welfare thr...

  • Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness

    Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.