• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

Research summary: Bring the People Back In: Contesting Benchmark Machine Learning

September 14, 2020

Summary contributed by our researcher Alexandrine Royer, who works at The Foundation for Genocide Education.

*Authors of full paper & link at the bottom


Mini-summary: The biases present in machine learning datasets, which revealed themselves to favour white, cisgender, male and Western subjects, have received a considerable amount of scholarly attention. Denton et al. argue that the scientific community has failed to consider the histories, values, and norms that construct and pervade such datasets. The authors intend to create a research program, what they termed the genealogy of machine learning, that works to understand how and why such datasets are created. By turning our attention to data collection, and specifically the labour involved in dataset creation, we can “bring the people back in” the machine learning process. For Denton et al., understanding the labour embedded in the dataset will push researchers to critically reflect on the type and origin of the data they are using and thereby contest some of its applications.

Full summary:

In recent years, industry and non-industry members have decried the prevalence of biased datasets against people of colour, women, LGBTQ+ communities, people with disabilities, and the working class within AI algorithms and machine learning systems. Due to societal backlash, data scientists have concentrated on adjusting the outputs of these systems. Fine-tuning algorithms to achieve “fairer results” have prevented, according to Denton et al., data scientists from questioning the data infrastructure itself, especially when it comes to benchmarks datasets. 

The authors point to how new forms of algorithmic fairness interventions generally center on the parity of representation between different demographic groups within the training datasets. They argue that such interventions fail to consider the issues present within data collection, which can involve exploitative mechanisms. Academics and industry members alike tend to disregard the question of why such datasets are created. Factors such as what and whose values are determining the type of data collected, in what conditions are the collection being done, and whether standard data collection norms are appropriate often escape data scientists. For Denton et al., data scientists and data practitioners ought to work to “denaturalize” the data infrastructure, meaning to uncover the assumptions and values that underlie prominent ML datasets. 

Taking inspiration from French philosopher Michel Foucault, the authors offer the first step what they termed the “genealogy” of machine learning. For a start, data and social scientists should trace the histories of prominent datasets, the modes of power as well as the unspoken labour that went into its creation. Labelling within datasets is organized through a particular categorical schema, but it is seen as widely applicable, even for models with different success metrics. Benchmarking datasets are treated as gold standards for machine learning evaluation and comparison, leading them to take on an authoritative status. Indeed, as summarized by the authors, “once a dataset is released and established enough to seamlessly support research and development, their contingent conditions of creation tend to be lost or taken for granted.” 

Once datasets achieve this naturalized status, they are perceived as natural and scientific objects and, therefore, can be used within multiple institutions or organizations.  Publicly available research datasets, constructed in an academic context, often provide the methodological backbone (i.e. infrastructure) for several industry-oriented AI tools. Despite the disparities in the amount of data collected, industry machine learners will still rely on these datasets to undergird the material research in commercial AI. Technological companies treat these shifts are merely changes in scale and rarely in kind. 

To reverse the taken-for-granted status of benchmark datasets, the authors offer four guiding research questions: 

  1. How do datasets developers in machine learning research describe and motivate the decisions that go into their creation? 
  2. What are the histories and contingent conditions of the creation of benchmark datasets in machine learning? As an example, the authors offer the case of Henrietta Lacks, an Afro-American woman whose cervical cancer cells were removed from her body without her consent before her death. 
  3. How do benchmark datasets become authoritative, and how does this impact research practice?
  4. What are the current work practices, norms, and routines that structure data collection, curation, and annotation of data in machine learning? 

The research questions offered by Denton et al. are a good start in encouraging machine learners to think critically as to whether their dataset is aligned with ethical principles and values. Any investigation into the history of science will quickly reveal how data-gathering operations are often part of predatory and exploitative behaviours, especially towards minority groups who have little recourse to contest these practices. Data science should not be treated as an exception to this long-standing historical trend.  The creators of data collection should merit as much ethical consideration as the subjects that form this data. By critically investigating the work practices of technical experts, we can begin to demand greater accountability and contestability in the development of benchmark datasets.


Original paper by Emily Denton, Alex Hanna, Razvan Amironesi, Andrew Smart, Hilary Nicole, Morgan Klaus Scheuerman: https://arxiv.org/abs/2007.07399

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

AI Policy Corner: New York City Local Law 144

Canada’s Minister of AI and Digital Innovation is a Historic First. Here’s What We Recommend.

Am I Literate? Redefining Literacy in the Age of Artificial Intelligence

AI Policy Corner: The Texas Responsible AI Governance Act

AI Policy Corner: Singapore’s National AI Strategy 2.0

related posts

  • On the Impact of Machine Learning Randomness on Group Fairness

    On the Impact of Machine Learning Randomness on Group Fairness

  • Project Let’s Talk Privacy (Research Summary)

    Project Let’s Talk Privacy (Research Summary)

  • People are not coins: Morally distinct types of predictions necessitate different fairness constrain...

    People are not coins: Morally distinct types of predictions necessitate different fairness constrain...

  • Research summary: A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous Algorithmic ...

    Research summary: A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous Algorithmic ...

  • Explainable artificial intelligence (XAI) post‐hoc explainability methods: risks and limitations in ...

    Explainable artificial intelligence (XAI) post‐hoc explainability methods: risks and limitations in ...

  • Human-AI Collaboration in Decision-Making: Beyond Learning to Defer

    Human-AI Collaboration in Decision-Making: Beyond Learning to Defer

  • Tiny, Always-on and Fragile: Bias Propagation through Design Choices in On-device Machine Learning W...

    Tiny, Always-on and Fragile: Bias Propagation through Design Choices in On-device Machine Learning W...

  • Achieving a ‘Good AI Society’: Comparing the Aims and Progress of the EU and the US

    Achieving a ‘Good AI Society’: Comparing the Aims and Progress of the EU and the US

  • Bridging the Gap: The Case For an ‘Incompletely Theorized Agreement’ on AI Policy (Research Summary)

    Bridging the Gap: The Case For an ‘Incompletely Theorized Agreement’ on AI Policy (Research Summary)

  • On the sui generis value capture of new digital technologies: The case of AI

    On the sui generis value capture of new digital technologies: The case of AI

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.