• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
    • Tech Futures
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Measuring Fairness of Text Classifiers via Prediction Sensitivity

June 17, 2022

🔬 Research Summary by Satyapriya Krishna, a PhD student at Harvard University working on problems related to Trustworthy Machine Learning.

[Original paper by Satyapriya Krishna, Rahul Gupta, Apurv Verma, Jwala Dhamala, Yada Pruksachatkun, Kai-Wei Chang]


Overview: With the rapid growth in language processing applications, fairness has emerged as an important consideration in data-driven solutions. Although various fairness definitions have been explored in the recent literature, there is a lack of consensus on which metrics most accurately reflect the fairness of a system. This paper introduces a new formulation – Accumulated Prediction Sensitivity, which measures fairness in machine learning models based on the model’s prediction sensitivity to perturbations in input features. The metric attempts to quantify the extent to which a single prediction depends on a protected attribute, where the protected attribute encodes the membership status of an individual in a protected group. It is observed that the proposed fairness metric based on prediction sensitivity is significantly more correlated with human annotation than the existing counterfactual fairness metric.


Introduction

Ongoing research is increasingly emphasizing the development of methods that detect and mitigate unfair social bias present in machine learning-based language processing models. These methods come under the umbrella of algorithmic fairness which has been quantitatively expressed with numerous definitions. These fairness definitions are broadly categorized into two types, i.e, individual fairness and group fairness. Individual fairness is aimed at evaluating whether a model gives similar predictions for individuals with similar personal attributes (e.g., age or race). On the other hand, group fairness evaluates fairness across cohorts with the same protected attributes instead of individuals. Although these two broad categories of fairness define valid notions of fairness, human understanding of fairness is also used to measure fairness in machine learning models. Existing studies often consider only one or two of these verticals, providing an incomplete picture of fairness in model predictions. In order to mitigate this problem, the authors propose a formulation based on models sensitivity to input features – the accumulated prediction sensitivity, to measure the fairness of model predictions, and establish its theoretical relationship with statistical parity (group fairness) and individual fairness metrics. They also empirically demonstrate the correlation between the proposed metric and human perception of fairness, hence, providing a much stronger comprehensive fairness metric.

Key Insights

Accumulated Prediction Sensitivity

Accumulated Prediction Sensitivity is defined as a metric that captures the sensitivity of a model to protected attributes such as gender, race, etc in learning tasks associated with language processing. In a nutshell, this metric is the amalgamation of three major components, i.e, (1) prediction sensitivity of the model with respect to input, (2) aggregation of sensitivity with respect to protected attributes, and (3) aggregation of sensitivity over a prediction class. This combination essentially drives the metric to enhance contributions made by protected features such as race, gender, etc in the model’s decision-making process.  Based on this notion, the accumulated prediction sensitivity score is expected to be smaller for fair models. 

Relation with Group Fairness

In a nutshell, group fairness ensures model outcome is independent of the protected features, which is also known as statistical parity. The proposed metric, Accumulated Prediction Sensitivity, aligns with this definition of group fairness and is proven by observing the expected score of zero for the case of perfect statistical parity. It is also empirically evident where authors show that if the modeler unintentionally uses the correlated feature, for instance “hair length” for “gender” (protected attribute) ,  while attempting to build a classifier with statistical parity, the proposed metric can be used for evaluation.

Relation with Individual Fairness

The notion of individual based fairness is stated as [Ref Dwork 2012]: “We interpret the goal of mapping similar people similarly to mean that the distributions assigned to similar people are similar”. This constraint is applied to the model training process by ensuring the model outcome follows the Lipschitz property with respect to some distance metric measuring distance between two samples in the population. Accumulated Prediction Sensitivity respects this fairness definition and the article shows this property by proving that the metric is bounded by the Lipschitz constant defined in the individual fairness constraint. This is further strengthened by the fact that increasing the Lipschitz constant would ease the fairness constraint resulting in a higher magnitude of the proposed metric, which is seen in experiment results as well.

Relation with Human Perception of Fairness

This metric is further tested with a user survey to validate its alignment with human perception of fairness. As part of the study, a group of annotators were requested to evaluate the model prediction and assess whether they believe the output is biased. For instance, given the social/cultural norms, a profession classifier assigning a data-point “she worked in a hospital” to nurse instead of doctor can be perceived as biased. The results of this study were then used to compute correlation with the Accumulated Prediction Sensitivity for multiple text classification datasets. Results from this study suggest that the proposed metric is significantly more correlated against the existing metric based on counterfactual examples.

Between the lines

Evaluating fairness is a challenging task as it requires selecting a specific notion of fairness (e.g. group or individual fairness) and then identifying metrics that can capture these notions of fairness while evaluating a predictor. Additionally, certain notions of fairness may not be well defined and can change based upon social norms (e.g. “volleyball” being closely associated with females); that may seep into the dataset at hand. The authors defined a metric that aligns with all the three common verticals of fairness metrics: group, individual, and human perception of fairness. 

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

An abstract spiral of dark circles appears at the centre, resembling a tornado. Several vintage magazine covers and advertisements are being drawn toward the spiral. The artworks that have already been pulled into it are becoming distorted and replaced with clusters of numbers representing their numerical embeddings.

Tech Futures: Better Imagination for Better Tech Futures

This image is a collage with a colourful Japanese vintage landscape showing a mountain, hills, flowers and other plants and a small stream. There are 3 large black data servers placed in the bottom half of the image, with a cloud of black smoke emitting from them, partly obscuring the scenery.

Tech Futures: Crafting Participatory Tech Futures

A network diagram with lots of little emojis, organised in clusters.

Tech Futures: AI For and Against Knowledge

A brightly coloured illustration which can be viewed in any direction. It has many elements to it working together: men in suits around a table, someone in a data centre, big hands controlling the scenes and holding a phone, people in a production line. Motifs such as network diagrams and melting emojis are placed throughout the busy vignettes.

Tech Futures: The Fossil Fuels Playbook for Big Tech: Part II

A rock embedded with intricate circuit board patterns, held delicately by pale hands drawn in a ghostly style. The contrast between the rough, metallic mineral and the sleek, artificial circuit board illustrates the relationship between raw natural resources and modern technological development. The hands evoke human involvement in the extraction and manufacturing processes.

Tech Futures: The Fossil Fuels Playbook for Big Tech: Part I

related posts

  • Structured access to AI capabilities: an emerging paradigm for safe AI deployment

    Structured access to AI capabilities: an emerging paradigm for safe AI deployment

  • Can We Engineer Ethical AI?

    Can We Engineer Ethical AI?

  • Response to the AHRC and WEF regarding Responsible Innovation in AI

    Response to the AHRC and WEF regarding Responsible Innovation in AI

  • NIST Special Publication 1270: Towards a Standard for Identifying and Managing Bias in Artificial In...

    NIST Special Publication 1270: Towards a Standard for Identifying and Managing Bias in Artificial In...

  • Confucius, cyberpunk and Mr. Science: comparing AI ethics principles between China and the EU

    Confucius, cyberpunk and Mr. Science: comparing AI ethics principles between China and the EU

  • Automating Informality: On AI and Labour in the Global South (Research Summary)

    Automating Informality: On AI and Labour in the Global South (Research Summary)

  • The Brussels Effect and AI: How EU Regulation will Impact the Global AI Market

    The Brussels Effect and AI: How EU Regulation will Impact the Global AI Market

  • Responsible AI In Healthcare

    Responsible AI In Healthcare

  • Language Models: A Guide for the Perplexed

    Language Models: A Guide for the Perplexed

  • Research summary: The Wrong Kind of AI? Artificial Intelligence and the Future of Labor Demand

    Research summary: The Wrong Kind of AI? Artificial Intelligence and the Future of Labor Demand

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.