• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

It’s COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks

March 2, 2022

🔬 Research Summary by Rishi Balakrishnan, a student at UC Berkeley passionate about algorithmic fairness, privacy, and trustworthy AI more broadly.

[Original paper by Michelle Bao, Angela Zhou, Samantha Zottola, Brian Brubach, Sarah Desmarais, Aaron Horowitz, Kristian Lum, Suresh Venkatasubramanian]


Overview:  Criminal justice (CJ) data is not neutral or objective: it emerges out of a messy process of noisy measurements, individual judgements, and location-dependent context. However, by ignoring the context around risk assessment instrument (RAI) datasets, computer science researchers both risk reinforcing upstream value judgements about what the data should say, and the downstream effects of their models on the justice system. The authors argue that responsibly and meaningfully engaging with this data requires computer scientists to explicitly consider the context of and values within these datasets.


Introduction

The issue of fairness in algorithms was thrust into public spotlight in 2016 with ProPublica publishing an exposé on the Northpointe’s COMPAS tool used to predict recidivism. Propublica’s article claimed that Northpointe’s tool was racially biased in that black defendants consistently received higher risk scores than white defendants, even when controlling for factors such as prior crimes, age, and gender. Since then, the field of algorithmic fairness has boomed, with more and more research devoted to the question of how to achieve fair outcomes with respect to some sensitive attribute (such as race). However, the field of algorithmic fairness and machine learning often looks at datasets such as the COMPAS dataset without considering the surrounding context, a practice that risks misinterpreting and misusing data. In this paper, the authors first overview many of the issues surrounding risk assessment indicator (RAI) datasets such as COMPAS and the disconnect between the algorithmic fairness literature and real-life fairness concerns. They then provide suggestions for CS researchers to responsibly engage with the data they use.

Data biases within RAI datasets

The authors first deep-dive into RAI datasets, providing several situations where bias in the data may arise. They specifically look at pre-trial RAI datasets, whose purpose is to inform pretrial detainment decisions. The legal system usually only commits to pre-trial detainment if the defendant is likely to flee or commit a violent crime, which is often hard to measure. Instead, datasets use failure to appear (FTA) as a proxy even though FTA is hardly equivalent to fleeing from the justice system. Defendants often do not appear at court because of scheduling, work, childcare, etc. meaning that detainment decisions based on predicted probabilities are likely too harsh as well as biased against low-income defendants. Second, the sensitive attribute (in most cases race) – which is integral to existing work on achieving algorithmic fairness – is noisy. For example, the authors cite a study that officer-reported racial designations were inconsistent for every racial group except for “black” [1]. Third, the data used as input is often biased itself – while datasets purport to measure crime, the most they can realistically do is measure arrest rates, which are biased along racial lines due to over-policing of minority communities. Data processing done to create the dataset itself also hides value-laden questions, like whether prior arrests based on crimes that are now decriminalized (such as marijuana possession) should be included in the data in the first place. The covariates, sensitive attribute, and target variable all have significant levels of noise even before a machine learning researcher touches the dataset.

Issues with algorithmic fairness in machine learning

But the problem doesn’t end there. Underlying the creation of many RAI models is the assumption they can be “plugged in” to the relevant part of the CJ pipeline. Reality is much messier. The criminal justice system contains several individual points of discretion, with one of the biggest being judicial. Judges can (and do) choose to ignore RAI recommendations, and the authors cite studies demonstrating judges deviate more from recommendations when the defendant is black than white [2]. Different jurisdictions also interpret risk scores differently: in some districts, a 40% chance of not appearing at court is considered “high-risk”, whereas in others that probability is as low as 10% [3]. Thus, claims about “reforming” or “improving” the criminal justice system which are only based off of fairer model performance should be viewed skeptically. These claims also implicitly justify the current justice system as worthy of reform – which can be a reasonable position – but one that papers hardly ever explicitly defend.

When incorporated into machine learning algorithms, fairness concerns are often just constraints on the underlying objective of reducing crime. This makes sense on face. However, few datasets take look at the long-term behavior of individuals, and the most efficient short-term way to reduce crime is to incarcerate because detained individuals cannot commit crime. 

The work on algorithmic fairness is also situated within the larger machine learning community, but there is a mismatch between machine learning practices and the care that CJ data requires. Machine learning papers focus not on gleaning new insights from data but rather on new methods. Machine learning conferences also require quick turn-around with authors often asked to implement their method on a new dataset during a one-week rebuttal period. While suited for pure machine learning tasks, these practices inevitably decontextualize and subordinate the dataset itself. Collaborating with domain experts is much harder in a culture that requires quick turn-around, but what often ensures ethical questions surrounding a dataset go unanswered is the practice of benchmarking. Once a seminal paper uses a dataset, the dataset quickly becomes the norm. Other papers use that dataset as a point of comparison of their method’s effectiveness, and omitting such a dataset from a paper risks rejection from a conference. As a result, flawed assumptions about a dataset quickly become baked into the topic literature.

Where do we go from here?

Machine learning is not the first field to work with RAI datasets. Psychology and criminology, among others, have over time found responsible ways to engage with CJ data. To lay out the path forward, the authors have a couple suggestions. First, they suggest avoiding using CJ datasets as generic real-world examples to test out new algorithmic fairness methods on. This also means avoiding making broad conclusions about the CJ system simply from these datasets. Each dataset emerges from a rich, complex context which shapes any insights gleaned from it.

To make this context more explicit, the authors advocate for writing data sheets and model cards which lay out the context and limitations of both datasets and the models built off of them. This also includes identifying underlying assumptions in algorithmic fairness measures and seeing how models behave when these assumptions are violated. Machine learning already studies similar questions under the banner of robustness, which can help address this. The authors also question the use of standard metrics like accuracy and AUC when working with these datasets, as these metrics neglect disparities in performance between specific individuals and groups. The authors close with a call for future work into making benchmarks more explicit about their ethical assumptions, and ask whether benchmarks can exist at all for CJ data.

Between the lines

This paper situates two broader critiques of machine learning within the domain of CJ data: 1) Datasets are not objective and context-free. Real life is messy, and a set of input features, labels, and sensitive attributes simply cannot capture all this complexity. Thus, choices on what to include in data reduces this complexity in a way that inherently makes value judgements. 2) An obsessive focus on benchmarks and achieving state of the art performance leads to a disconnect from the real world problems that gave rise to datasets in the first place. With a hope that machine learning systems will be deployed in situations like the criminal justice system comes a responsibility to study the real-world effects of these algorithms. Machine learning researchers can no longer afford to be disconnected from the world they influence.

References

[1] Kristian Lum, Chesa Boudin, and Megan Price. The impact of overbooking on a pre-trial risk assessment tool. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, pages 482–491, New York, NY, USA, January 2020. Association for Computing Machinery. 


[2] Alex Albright. If you give a judge a risk score: Evidence from Kentucky bail decisions. In 
The John M. Olin Center for Law, Economics, and Business Fellows’ Discussion Paper Series 85, 2019. 


[3] Sandra G Mayson. Dangerous defendants. Yale Law Journal, 127:490, 2017

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

Canada’s Minister of AI and Digital Innovation is a Historic First. Here’s What We Recommend.

Am I Literate? Redefining Literacy in the Age of Artificial Intelligence

AI Policy Corner: The Texas Responsible AI Governance Act

AI Policy Corner: Singapore’s National AI Strategy 2.0

AI Governance in a Competitive World: Balancing Innovation, Regulation and Ethics | Point Zero Forum 2025

related posts

  • How to Help People Understand AI

    How to Help People Understand AI

  • Towards Intersectionality in Machine Learning: Including More Identities, Handling Underrepresentati...

    Towards Intersectionality in Machine Learning: Including More Identities, Handling Underrepresentati...

  • Fashion piracy and artificial intelligence—does the new creative environment come with new copyright...

    Fashion piracy and artificial intelligence—does the new creative environment come with new copyright...

  • Rise of the machines: Prof Stuart Russell on the promises and perils of AI

    Rise of the machines: Prof Stuart Russell on the promises and perils of AI

  • Evolution in Age-Verification Applications: Can AI Open Some New Horizons?

    Evolution in Age-Verification Applications: Can AI Open Some New Horizons?

  • International Institutions for Advanced AI

    International Institutions for Advanced AI

  • Routing with Privacy for Drone Package Delivery Systems

    Routing with Privacy for Drone Package Delivery Systems

  • GenAI Against Humanity: Nefarious Applications of Generative Artificial Intelligence and Large Langu...

    GenAI Against Humanity: Nefarious Applications of Generative Artificial Intelligence and Large Langu...

  • To Be or Not to Be Algorithm Aware: A Question of a New Digital Divide? (Research Summary)

    To Be or Not to Be Algorithm Aware: A Question of a New Digital Divide? (Research Summary)

  • Research summary: Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI...

    Research summary: Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI...

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.