Prediction Sensitivity: Continual Audit of Counterfactual Fairness in Deployed Classifiers

🔬 Research Summary by Ivoline C. Ngong, Krystal Maughan, Joseph P. Near. Ivoline is a first-year doctoral student in Computer Science at the University of Vermont whose research interests lie at the intersection of data privacy, fairness, and machine learning. Krystal is a third-year doctoral student in Computer Science at the University of Vermont whose current work focuses on Mathematical cryptography, specifically Isogeny-based cryptography. She has also published in fairness and privacy / Trustworthy AI. Joseph is an assistant professor of computer science at the University of Vermont. His research interests include security, privacy, machine learning, and programming languages.

[Original paper by Krystal Maughan, Ivoline C. Ngong, Joseph P. Near]

Overview: In a world where AI-based systems dominate many parts of our daily lives, auditing these systems for fairness represents a growing concern. Despite being used to audit these systems, group fairness metrics do not always uncover discrimination against individuals and they are difficult to apply once the system has been deployed. Counterfactual fairness describes an individual notion of fairness, which is particularly difficult to evaluate after deployment. In this paper, we present prediction sensitivity, an approach for auditing counterfactual fairness while the model is deployed and making predictions.

Introduction

The Challenge: Biased Predictions in Machine Learning. AI-based decision-making systems play an increasing role in a wide range of high-stakes decisions, prompting the question of whether they can lead to unfair outcomes. To address these legitimate concerns, several metrics have been proposed to audit these classifiers, measure and mitigate bias before they are deployed. Despite group fairness metrics like disparate impact and equalized odds being useful at uncovering discrimination in these classifiers, they fail to catch discrimination against individuals. Secondly, most existing techniques focus on auditing or mitigating for bias before these classifiers are deployed. There has been comparatively less research on methods to ensure that trained classifiers continue to perform well after deployment, even though novel examples encountered after deployment are likely to reveal bias in trained classifiers.

Our Approach: Measuring Counterfactual Fairness using Prediction Sensitivity. To address these problems, we propose an approach for continuous audit of counterfactual fairness in deployed classifiers called prediction sensitivity. Counterfactual fairness requires that a fair classifier would have made the same prediction if the individual had belonged to a different demographic group – hence achieving individual fairness. Prediction sensitivity attempts to measure counterfactual fairness: it is high when a prediction would have been different, if the individual had belonged to a different demographic group.
Toward Continual Audit of Deployed Models. Prediction sensitivity is designed to be deployed alongside the trained classifier, in order to raise an alarm before a potentially unfair prediction is made. This is an important context because deployed classifiers may encounter novel examples not considered by the classifier’s designers, which may trigger discriminatory behavior that is impossible to detect at training time.

Key Insights

Prediction Sensitivity

Prediction sensitivity is a measure of how changes in the input would influence changes in a model’s prediction, weighted by each feature’s influence on the individual’s protected status. The intuition behind this measure is that a model that is particularly sensitive to protected attributes – such as race or gender – is likely to violate counterfactual fairness.

Prediction sensitivity is the dot product of two gradients: the “protected status feature weights” and the “per-feature influence” (as shown in the figure below).

The protected status feature weights measure how the features of the input x contribute to its protected status. The weights are determined by training the Protected status model A to predict x’s protected status and computing the gradient using A(x). Consequently, if A captures the correlation between input features and protected status, then these weights represent the strength of each feature’s influence on the protected status.

The per-feature influence, on the other hand, captures the influence of each input feature on the classifier’s predictions. It describes how much a hypothetical change in each feature of x would affect the Trained Classifier, F’s prediction.

Detecting Violations of Counterfactual Fairness

To detect discriminatory behavior, we can use a simple procedure for each prediction the classifier makes:

Calculate prediction sensitivity for the prediction
If prediction sensitivity is too high (i.e. above a threshold), sound the alarm!

To show that prediction sensitivity is a useful measure of fairness, we need to answer the question: Does our procedure accurately detect discrimination in deployed classifiers?

Unfortunately, this is an extremely tricky question to answer conclusively, for two reasons. First, it is notoriously difficult even to define fairness – numerous conflicting definitions exist, and the research community has not reached consensus about which one(s) to use. Second, even if we settle on a definition, evaluating the accuracy of prediction sensitivity requires ground truth information about both a real situation and its counterfactual (i.e. what would have been different) – which is essentially impossible in most realistic settings.

We evaluated prediction sensitivity experimentally by constructing synthetic data, which gives us control over the causal relationships in the data. In this setting, we find that prediction sensitivity does a good job detecting (constructed) violations of counterfactual fairness. We also conducted related experiments using real data with counterfactual augmentation – an attempt to construct the counterfactual situation for each data point – and obtained promising results.

Between the lines

We’re excited about our results in this paper; we think prediction sensitivity may be an important new tool to add to our toolbox for evaluating fairness in machine learning. In particular, prediction sensitivity allows monitoring classifiers after deployment, for every prediction they make.

There are, however, some limitations to the proposed method. A third party cannot audit the deployed system without access to the classifiers since prediction sensitivity can only be calculated with white box access to the classifier. So, we need to rely on organizations who develop models to deploy prediction sensitivity themselves – which they may not be incentivized to do.

Our experimental results suggest that prediction sensitivity may be effective at detecting violations of counterfactual fairness, but we had to make major assumptions about the world in order to conduct them. Are these assumptions reasonable? Answering this question is really difficult (or maybe impossible!), and it’s a major complication in recent fairness research more generally.