• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
    • Tech Futures
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Research summary: Fairness in Clustering with Multiple Sensitive Attributes

August 17, 2020

Summary contributed by our researcher Alexandrine Royer, who works at The Foundation for Genocide Education.

*Authors of full paper & link at the bottom


Mini-summary: With the expansion and volume of readily available data, scientists have set in place AI techniques that can quickly categorize individuals based on shared characteristics. A common task in unsupervised machine learning, known as clustering, is to identify similarities in raw data and group these data points into clusters. As is frequently the case with decision-making algorithms, notions of fairness are put into the forefront when discriminatory patterns and homogenous groupings start to appear.

For data scientists and statisticians, the challenge is to develop fair clustering techniques that can protect the sensitive attributes of a given population, with each of these sensitive attributes in a cluster proportionately reflecting that of the overall dataset. Until recently, it appeared mathematically improbable to achieve statistical parity when balancing more than one sensitive attribute. The authors offer an innovative statistical method, called Fair K-Means, that can account for multiple multi-valued or numeric sensitive attributes. Fair K-Means can bridge the gap between previously believed incompatible notions of fairness.

Full summary:

For a quick definition, clustering is a statistical technique that involves grouping similar points or objects in a dataset. With AI systems, part of an algorithm’s task is to sort through this set of objects (which can refer to individuals) and form these similar clusters. For data analysts, clustering is a necessary task given the infeasibility of conducting manual per-object assessment or appreciation in each dataset, especially when those objects number in the thousands. In a hiring scenario with a high volume of applicants, large corporations may design an algorithm to group and rank candidates based on their resumes. Those in the top cluster, with similar desired attributes, will be sent out an email informing them of the shortlisting decision. Already in this example, a few ethics-minded eyebrows might be raised.

As we have learned, algorithms, unless explicitly instructed to, will not consider principles of individual or group fairness when categorizing objects. Group fairness is related to the protection of people who share sensitive attributes, such as age, gender, ethnicity, relationship status and so on. Algorithms, if left unchecked, can create highly skewed and homogenous clusters that do not represent the demographics of the dataset. These bias clusters may serve to reinforce societal stereotypes. Even if data identifiers, such as gender, are removed, statistical correlations can still lead to gender-homogenous clusters. Gender is just but one among the plurality of sensitive attributes to consider within analytics pipelines to avoid undue discrimination. Data scientists have tried to find ways to balance one “sensitive” attribute at a time or to account for multiple binary-only (ex: citizen or non-citizen) attributes.

The authors offer a new and fairer clustering method called Fair K-Means (Fair KM). Their Fair K-means is a pioneering statistical technique in that it can consider multiple multi-valued or numeric sensitive attributes in clustering for various scenarios. For the layperson, in k-means clustering, you identify a target number k, which represents the number of groups formed with the data. Each data point is assigned to a K group based on shared similar features. With every k group, the centroid (the mean position of all the data points) in the cluster pinpoints what type of population each group represents. The Fair K-Means, when assigning points to a cluster, will not only consider similarities to the centroid but also which cluster it will skew the least in terms of sensitive attributes. To be more precise, the clusters formed will try to proportionally respect the demographic characteristics of the overall dataset, hence allowing for representational fairness in clustering. In their application of Fair KM, the algorithm was able to cluster individuals according to nine attributes, while balancing five sensitive ones, such as relationship status and country of origin. 

The clusters formed with the FairKM method scored better on both clustering quality and fair representation of sensitive attribute groups than other prominent clustering methods. The Fair K-Means can also be fed-back into algorithmic training to repeatedly improve clustering performances. One limitation of the approach was the quality of the original dataset, a census of 15 000 individuals, in the method tests. It will be necessary to test whether this approach can hold up for extremely skewed data points.

With Fair KM, we are inching closer to forming statistical methods for decision-making algorithms that abide by principles of equality, equity, justice, and respect for diversity, the core tenets of democratic societies. Although the perfect balance may be a statistical ideal, it remains far from a lived reality. There are societal limitations that continue to prevent total gender parity or social equality. Those factors are outside the immediate control of data scientists but should be acknowledged when designing predictive decision-making algorithms.


Original paper by Savitha Sam Abraham, Deepak P., Sowmya S Sundaram: https://arxiv.org/abs/1910.05113

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

This image is a collage with a colourful Japanese vintage landscape showing a mountain, hills, flowers and other plants and a small stream. There are 3 large black data servers placed in the bottom half of the image, with a cloud of black smoke emitting from them, partly obscuring the scenery.

Tech Futures: Crafting Participatory Tech Futures

A network diagram with lots of little emojis, organised in clusters.

Tech Futures: AI For and Against Knowledge

A brightly coloured illustration which can be viewed in any direction. It has many elements to it working together: men in suits around a table, someone in a data centre, big hands controlling the scenes and holding a phone, people in a production line. Motifs such as network diagrams and melting emojis are placed throughout the busy vignettes.

Tech Futures: The Fossil Fuels Playbook for Big Tech: Part II

A rock embedded with intricate circuit board patterns, held delicately by pale hands drawn in a ghostly style. The contrast between the rough, metallic mineral and the sleek, artificial circuit board illustrates the relationship between raw natural resources and modern technological development. The hands evoke human involvement in the extraction and manufacturing processes.

Tech Futures: The Fossil Fuels Playbook for Big Tech: Part I

Close-up of a cat sleeping on a computer keyboard

Tech Futures: The threat of AI-generated code to the world’s digital infrastructure

related posts

  • More Trust, Less Eavesdropping in Conversational AI

    More Trust, Less Eavesdropping in Conversational AI

  • Deepfakes and Domestic Violence: Perpetrating Intimate Partner Abuse Using Video Technology

    Deepfakes and Domestic Violence: Perpetrating Intimate Partner Abuse Using Video Technology

  • LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models

    LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models

  • Computer vision, surveillance, and social control

    Computer vision, surveillance, and social control

  • Report on the Santa Clara Principles ​for Content Moderation

    Report on the Santa Clara Principles ​for Content Moderation

  • Responsible AI Licenses: social vehicles toward decentralized control of AI

    Responsible AI Licenses: social vehicles toward decentralized control of AI

  • Open and Linked Data Model for Carbon Footprint Scenarios

    Open and Linked Data Model for Carbon Footprint Scenarios

  • Social Robots and Empathy: The Harmful Effects of Always Getting What We Want

    Social Robots and Empathy: The Harmful Effects of Always Getting What We Want

  • Analysis and Issues of Artificial Intelligence Ethics in the Process of Recruitment

    Analysis and Issues of Artificial Intelligence Ethics in the Process of Recruitment

  • Who Funds Misinformation? A Systematic Analysis of the Ad-related Profit Routines of Fake News sites

    Who Funds Misinformation? A Systematic Analysis of the Ad-related Profit Routines of Fake News sites

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.