Measuring Fairness of Text Classifiers via Prediction Sensitivity

🔬 Research Summary by Satyapriya Krishna, a PhD student at Harvard University working on problems related to Trustworthy Machine Learning.

[Original paper by Satyapriya Krishna, Rahul Gupta, Apurv Verma, Jwala Dhamala, Yada Pruksachatkun, Kai-Wei Chang]

Overview: With the rapid growth in language processing applications, fairness has emerged as an important consideration in data-driven solutions. Although various fairness definitions have been explored in the recent literature, there is a lack of consensus on which metrics most accurately reflect the fairness of a system. This paper introduces a new formulation – Accumulated Prediction Sensitivity, which measures fairness in machine learning models based on the model’s prediction sensitivity to perturbations in input features. The metric attempts to quantify the extent to which a single prediction depends on a protected attribute, where the protected attribute encodes the membership status of an individual in a protected group. It is observed that the proposed fairness metric based on prediction sensitivity is significantly more correlated with human annotation than the existing counterfactual fairness metric.

Introduction

Ongoing research is increasingly emphasizing the development of methods that detect and mitigate unfair social bias present in machine learning-based language processing models. These methods come under the umbrella of algorithmic fairness which has been quantitatively expressed with numerous definitions. These fairness definitions are broadly categorized into two types, i.e, individual fairness and group fairness. Individual fairness is aimed at evaluating whether a model gives similar predictions for individuals with similar personal attributes (e.g., age or race). On the other hand, group fairness evaluates fairness across cohorts with the same protected attributes instead of individuals. Although these two broad categories of fairness define valid notions of fairness, human understanding of fairness is also used to measure fairness in machine learning models. Existing studies often consider only one or two of these verticals, providing an incomplete picture of fairness in model predictions. In order to mitigate this problem, the authors propose a formulation based on models sensitivity to input features – the accumulated prediction sensitivity, to measure the fairness of model predictions, and establish its theoretical relationship with statistical parity (group fairness) and individual fairness metrics. They also empirically demonstrate the correlation between the proposed metric and human perception of fairness, hence, providing a much stronger comprehensive fairness metric.

Key Insights

Accumulated Prediction Sensitivity

Accumulated Prediction Sensitivity is defined as a metric that captures the sensitivity of a model to protected attributes such as gender, race, etc in learning tasks associated with language processing. In a nutshell, this metric is the amalgamation of three major components, i.e, (1) prediction sensitivity of the model with respect to input, (2) aggregation of sensitivity with respect to protected attributes, and (3) aggregation of sensitivity over a prediction class. This combination essentially drives the metric to enhance contributions made by protected features such as race, gender, etc in the model’s decision-making process. Based on this notion, the accumulated prediction sensitivity score is expected to be smaller for fair models.

Relation with Group Fairness

In a nutshell, group fairness ensures model outcome is independent of the protected features, which is also known as statistical parity. The proposed metric, Accumulated Prediction Sensitivity, aligns with this definition of group fairness and is proven by observing the expected score of zero for the case of perfect statistical parity. It is also empirically evident where authors show that if the modeler unintentionally uses the correlated feature, for instance “hair length” for “gender” (protected attribute) , while attempting to build a classifier with statistical parity, the proposed metric can be used for evaluation.

Relation with Individual Fairness

The notion of individual based fairness is stated as [Ref Dwork 2012]: “We interpret the goal of mapping similar people similarly to mean that the distributions assigned to similar people are similar”. This constraint is applied to the model training process by ensuring the model outcome follows the Lipschitz property with respect to some distance metric measuring distance between two samples in the population. Accumulated Prediction Sensitivity respects this fairness definition and the article shows this property by proving that the metric is bounded by the Lipschitz constant defined in the individual fairness constraint. This is further strengthened by the fact that increasing the Lipschitz constant would ease the fairness constraint resulting in a higher magnitude of the proposed metric, which is seen in experiment results as well.

Relation with Human Perception of Fairness

This metric is further tested with a user survey to validate its alignment with human perception of fairness. As part of the study, a group of annotators were requested to evaluate the model prediction and assess whether they believe the output is biased. For instance, given the social/cultural norms, a profession classifier assigning a data-point “she worked in a hospital” to nurse instead of doctor can be perceived as biased. The results of this study were then used to compute correlation with the Accumulated Prediction Sensitivity for multiple text classification datasets. Results from this study suggest that the proposed metric is significantly more correlated against the existing metric based on counterfactual examples.

Between the lines

Evaluating fairness is a challenging task as it requires selecting a specific notion of fairness (e.g. group or individual fairness) and then identifying metrics that can capture these notions of fairness while evaluating a predictor. Additionally, certain notions of fairness may not be well defined and can change based upon social norms (e.g. “volleyball” being closely associated with females); that may seep into the dataset at hand. The authors defined a metric that aligns with all the three common verticals of fairness metrics: group, individual, and human perception of fairness.