🔬 Research Summary by Corinna Hertweck, a fourth-year PhD student at the University of Zurich and the Zurich University of Applied Sciences where she is working on algorithmic fairness.
[Original paper by Eleonora Viganò, Corinna Hertweck, Christoph Heitz, Michele Loi]
Overview: Fairness has become an increasingly important concern in the design of automated decision-making systems. So-called fairness criteria can help us evaluate the fairness of such systems. The paper “On Statistical Criteria of Algorithmic Fairness” by Brian Hedden, however, argues that most of the statistical fairness criteria we use today do not necessarily have to be fulfilled to achieve fairness. Our paper questions the practical relevance of these findings for machine learning practitioners by showing that his argument does not apply to most machine learning systems used today.
With the increasing use of automated decision-making systems in high-stakes domains, such as hiring, lending, and education, fairness has to be integrated into the design of these systems. So-called fairness criteria or fairness metrics are used to measure fairness in these systems. Brian Hedden’s paper “On Statistical Criteria of Algorithmic Fairness” provides good reasons to question whether some of the most widely used statistical fairness criteria really need to be fulfilled for a system to be fair, i.e., whether they are necessary conditions for fairness. Our paper shows that Hedden’s argument does not apply to most automated decision-making systems today. To show this, we reconstruct his argument and critically analyze it. We conclude that while the analyzed fairness criteria are indeed not necessary conditions for fairness for all automated decision-making systems, they are still relevant and might even be necessary for some of today’s most used decision-making systems.
According to Hedden, most statistical fairness criteria are not necessary for fairness
The field of algorithmic fairness has developed many fairness criteria. Hedden’s paper shows that most of these fairness criteria are not necessary conditions for fairness. For this, he uses an example in which people are given coins. A person with a coin that lands “heads” is considered a “heads person.” The task is to predict whether a person is a “heads person.” Hedden further assumes that every coin has a certain known probability of landing heads. For example, a coin with a probability of 0.7 lands heads in 70% of cases. Thus, The predictive algorithm looks at the coin’s probability of landing heads to make this prediction. If it is above 50%, it predicts the person to be a heads person. Hedden then constructs a case in which this perfectly fair prediction of being a heads person violates most fairness criteria he considers. From that, he follows that most fairness criteria are not necessary conditions for fairness.
Hedden’s result does not apply to most ML systems
While we do not contest the correctness of the author’s argument, we do contest its relevance for machine learning practitioners. To show this, we distinguish two kinds of predictions:
(1) Predictions based on data from many people and
(2) Predictions based on data from one person only.
Clearly, in machine learning, we always follow the first approach when we make predictions about people because data from one person is never enough to properly train a machine learning model. On the other hand, the second kind of prediction is easy to do with coins: We can toss a single coin a thousand times and get a good prediction of how likely it is to land heads in the next toss. This is almost impossible to do with people because people are not coins.
Notice, however, how Hedden’s argument relies on the second kind of prediction, on predictions that are based on data from only that one person. One only needs data from one person (or rather their coin) to make predictions about them. Data from other people is not aggregated or analyzed to make predictions. Because of that, we argue that his argument does not apply to most machine learning applications – and the question of whether the fairness criteria Hedden considered could be necessary conditions for these systems remains undetermined.
Is it morally permissible to make predictions about an individual based on similar individuals?
Now, with these two kinds of predictions in mind, you may argue that the first kind of prediction is always morally wrong because we should treat people as individuals – so talking about the necessity of fairness criteria for those kinds of predictions is pointless. We reply that treating people as individuals is a moral principle that can be overridden by other moral obligations or disregarded in some contexts (e.g., in fleeting interactions or in cases with limited individual information). Predictions about individuals based on data from other people may be morally appropriate if they, for example, prevent bigger harm. However, when this is the case and predictions about individuals are thus morally appropriate, they also have to be fair, and fairness criteria can help ensure this.
Conclusion: Statistical fairness criteria are still relevant for most ML systems
In conclusion, our paper shows that one should not disregard the fairness criteria listed by Hedden as “not necessary” just because they are not necessary conditions for all predictive models. Most machine learning applications are not vulnerable to Hedden’s argument because they base their predictions on data from many people. The fairness criteria considered by Hedden could thus still be necessary conditions for fairness in some cases.
Between the lines
This paper discusses the relevance – in particular, the necessity – of fulfilling existing statistical fairness criteria. We showed that Brian Hedden’s argument, which concludes that most criteria are unnecessary for fairness, does not apply to most automated decision-making systems today. On a broader note, however, we must acknowledge how context-dependent fairness is. What makes for a good fairness criterion in one context could be unfit for another context. When it comes to using fairness criteria in practice, what matters most is the question of how to find an appropriate criterion in one’s context. This is still an open question in the literature and of high relevance in practice as model developers and stakeholders are confronted with how to evaluate their specific system’s fairness.