• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • šŸ‡«šŸ‡·
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

Measuring Disparate Outcomes of Content Recommendation Algorithms with Distributional Inequality Metrics

June 17, 2022

šŸ”¬ Research Summary by Tomo Lazovich (they/them) is a Senior Machine Learning Researcher on Twitter’s ML Ethics, Transparency, and Accountability (META) team.

[Original paper by Tomo Lazovich, Luca Belli, Aaron Gonzales, Amanda Bower, Uthaipon Tantipongpipat, Kristian Lum, Ferenc Huszar, Rumman Chowdhury]


Overview: Some popular ML fairness metrics are hard to operationalize in practice, largely due to the absence of demographic data in industry settings. This paper proposes a complementary set of metrics, originally used in economics to measure income inequality, as a way to capture disparities in outcomes of large-scale ML systems.


Introduction

In recent years, many examples of the potential harms caused by machine learning systems have come to the forefront (see a collection of them in the awful-ai repository). Practitioners in the field of algorithmic bias and fairness have developed a suite of metrics to capture one aspect of these harms: namely, differences in performance between different demographic groups and, in particular, worse performance for marginalized communities. Take, for example, the now-iconic Gender Shades paper, which found that commercial gender recognition systems performed significantly worse for darker-skinned women. Despite great progress made in this area, one open question became particularly prominent for industry practitioners: how do you capture such disparities if you don’t have reliable demographic data or choose not to collect it due to privacy concerns?Ā 

This paper approaches the problem by adapting income inequality metrics from economics. You’ve probably heard statistics like ā€œthe top 1% of people own X% share of wealthā€, and this work applies those notions to levels of engagement on Twitter. If you imagine Twitter as an economy, with ā€˜impressions’ being distributed instead of dollars, then the ā€˜rich’ are those users who have gotten many impressions in the past, now have many followers, and are therefore more likely to get more impressions in the future.. By applying inequality metrics to the distributions of engagements, we seek to understand whether our algorithmic systems are reinforcing or worsening ā€˜rich get richer’ dynamics on the platform. It’s been shown that a majority of Twitter users feel that only a few people, or no one at all, are seeing their Tweets, a phenomenon that has been referred to as ā€œTweeting into the voidā€. This work uses inequality metrics at the distribution level to understand exactly how skewed engagements are on Twitter, and additionally digs deeper to isolate some of the algorithms that may be driving that effect. At the same time, it also evaluates a number of metrics in terms of desirable criteria. Overall, it finds that inequality metrics are a useful complement to demographic-based fairness metrics and do faithfully capture skews in outcomes.

Key Insights

The work benchmarks a total of seven different inequality metrics on two Twitter-related case studies. Each metric is evaluated with respect to a number of criteria, including both desirable mathematical properties and subjective criteria like interpretability. Going through the details of all of the metrics is out of scope for this summary, but two you may have heard in the news are the Gini coefficient and top 1% share. The Gini coefficient is a number that ranges between zero and one, with 1 being the case where one person in the distribution holds all the wealth and zero being the case where everyone has an equal amount of wealth. The top 1% share is simply the percentage of total wealth that is held by people in the top 1% of the distribution. In this work, instead of measuring inequality in ā€œincomeā€ or ā€œwealthā€, the metrics are used to measure the skew, or ā€œtop-heavinessā€,  in how many impressions and other engagements (likes, retweets, etc.) authors get on Twitter. 

Result 1: Inequality metrics meaningfully capture differences in skew between different engagement types

The first case study tries to answer a simple question: do these metrics actually work on real data from Twitter? That is, when we apply them to distributions that we know have different levels of skew, do we actually see meaningful differences in the metrics? The results show clearly that, yes, we can distinguish distributions with these metrics! In particular, it is shown that the skew of the engagements goes roughly with the level of effort needed for that engagement. Impressions (having someone look at your Tweet) have a lower level of skew, while quote Tweets (sharing someone’s Tweet to your timeline and adding your own Tweet on top of it) have the highest level of skew. To quantify, the top 1% of users get almost 80% of all impressions and almost 90% of all quote Tweets. Not only do we see that there are clear differences between distributions, but also that these distributions on Twitter in general are very highly skewed. 

This leads to the next question the paper addresses: can we use these metrics to identify potential algorithmic drivers of this high level of skew?

Result 2: Inequality metrics identify out-of-network suggestions as potential drivers of skew

In the second case study, the work focuses on the impression distribution specifically. It then breaks down impressions by which algorithm led to the Tweet being on the reader’s timeline. Some of these are ā€œin-networkā€ (IN) suggestions, meaning they were ranked highly and are from an author the reader follows. Others are ā€œout-of-networkā€ (OON) suggestions, or Tweets from authors whom the reader does not directly follow. One example of out-of-network suggestions that can appear on a timeline are Tweets that were liked by someone the reader follows, but the reader does not follow the Tweet author themself. 

When breaking down algorithmic sources between IN and different types of OON suggestions, it was found that OON suggestions in general have much higher levels of skew. The top 1% of users get around 77% of impressions from IN Tweets, but they get close to 99% of impressions from certain kinds of OON Tweets. Additionally, when you break down by number of followers, you find that the difference between the skew of IN and OON Tweets is much larger for authors with low numbers of followers. All of this serves as evidence that the structure of the graph itself may be driving some of this inequality, with certain algorithms exacerbating the effect more than others.

Between the lines

Moving forward, one of the most interesting lines of research will be to better understand how the structure of Twitter’s social graph is feeding the inequality observed in this paper. Ideally, we can develop methods to decouple algorithmic behavior from the graph structure. Additionally, since this work found that inequality metrics are a useful complement to demographic-based metrics, future work can focus on how to incorporate these metrics into automated testing, feature review processes, and other internal procedures that are part of the ML evaluation cycle. We are currently exploring ways to implement this metric in practice at Twitter, so product owners can have better visibility into the impacts of their models. Overall, these findings are a promising step in the real-world operationalization of ML fairness/bias metrics.Ā 

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

šŸ” SEARCH

Spotlight

Canada’s Minister of AI and Digital Innovation is a Historic First. Here’s What We Recommend.

Am I Literate? Redefining Literacy in the Age of Artificial Intelligence

AI Policy Corner: The Texas Responsible AI Governance Act

AI Policy Corner: Singapore’s National AI Strategy 2.0

AI Governance in a Competitive World: Balancing Innovation, Regulation and Ethics | Point Zero Forum 2025

related posts

  • The Ethics of Emotion in AI Systems (Research Summary)

    The Ethics of Emotion in AI Systems (Research Summary)

  • Measuring Surprise in the Wild

    Measuring Surprise in the Wild

  • Rethinking Fairness: An Interdisciplinary Survey of Critiques of Hegemonic ML

    Rethinking Fairness: An Interdisciplinary Survey of Critiques of Hegemonic ML

  • Should you make your decisions on a WhIM? Data-driven decision-making using a What-If Machine for Ev...

    Should you make your decisions on a WhIM? Data-driven decision-making using a What-If Machine for Ev...

  • The Impact of Recommendation Systems on Opinion Dynamics: Microscopic versus Macroscopic Effects

    The Impact of Recommendation Systems on Opinion Dynamics: Microscopic versus Macroscopic Effects

  • A fair pricing model via adversarial learning

    A fair pricing model via adversarial learning

  • A Matrix for Selecting Responsible AI Frameworks

    A Matrix for Selecting Responsible AI Frameworks

  • Bridging the Gap: The Case For an ā€˜Incompletely Theorized Agreement’ on AI Policy (Research Summary)

    Bridging the Gap: The Case For an ā€˜Incompletely Theorized Agreement’ on AI Policy (Research Summary)

  • Online public discourse on artificial intelligence and ethics in China: context, content, and implic...

    Online public discourse on artificial intelligence and ethics in China: context, content, and implic...

  • AI and the Global South: Designing for Other Worlds  (Research Summary)

    AI and the Global South: Designing for Other Worlds (Research Summary)

Partners

  • Ā 
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • Ā© MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.