• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
    • Tech Futures
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Problematic Machine Behavior: A Systematic Literature Review of Algorithm Audits

March 30, 2022

🔬 Research Summary by Shreyasha Paudel, an independent researcher with an expertise in uncertainty in computer vision. She is about to start her PhD in Human Computer Interaction studying the ethical and social impacts of automation, digitization and data in developing countries from Fall 2022.

[Original paper by Jack Bandy]


Overview:  As algorithmic systems become more pervasive in day-to-day life, audits of AI algorithms have become an important tool for researchers, journalists, and activists to answer questions about their disparate impacts and potential problematic behaviors. Audits have been used to prove discrimination in criminal justice algorithms, racial discrimination in Google search, race and gender discrimination in facial algorithms, and many other impacts. This paper does a systematic literature review of external audits of public facing algorithms and contextualizes their findings in a high level taxonomy of problematic behaviors. It is a useful read to learn what problems have audits found so far as well as get recommendations on future audit projects.


Introduction

Most discussion of algorithmic harms cite Propublica’s “Machine Bias” [1] and “Gender Shades” [2] projects. These are examples of high profile audits which discovered racial and gender discrimination in recidivism prediction and facial recognition algorithms respectively and led to large public outcry.  In addition to these well-known examples, audits have become an important tool for diagnosing problematic behavior in algorithmic systems. However, each audit is still considered an independent project and relatively little work has gone toward systematizing prior work and planning future agenda for algorithmic auditing. 

To that end, this paper conducts a thematic analysis of 62 external audits of public-facing algorithmic systems in a wide range of domain areas such as Search, Advertising, Recommendation, Pricing, Vision, Criminal Justice, Language Processing, and Mapping.  Through this analysis, the author tries to answer two key research questions: 

  1. What kinds of problematic machine behavior have been diagnosed by previous algorithm audits ?
  2. What remains for future algorithm audits to examine the problematic ways that algorithms exercise power in society ?

Using the results of their analysis, the author presents a high level taxonomy of problematic behaviors that an algorithmic audits can find, namely – discrimination, distortion, exploitation, and misjudgement. Out of the 62 studies reviewed, most focused on discrimination (N=21) or distortion (N=29). Audit studies also gave more attention to search algorithms (N=25), advertising algorithms (N=12), and recommendation algorithms (N=8), helping to diagnose a range of problematic behaviors on these systems. The paper closes with recommendations for future audits as well as a discussion to contextualize audits with other tools used for algorithm justice.

This paper defines algorithmic audit as “an empirical study investigating a public algorithm system for potential problematic behavior”.  With this definition, the author screened 503 papers from a variety of academic publications and shortlisted 62 papers based on their title and abstracts. These papers were read, summarized, and thematically analyzed to categorize them based on audit methods, algorithm application domain, and problematic behavior evaluated by the audit. 

What kind of problematic behaviors have Algorithmic Audits found?

Discrimination: 

The paper defines discrimination as “whether an algorithm disparately treats / impacts people on the basis of demographic categories such as race, gender, age, socioeconomic status, location and/or intersectional identity.” Discrimination was found in algorithms for advertisement, search, vision, and pricing and have caused both allocation harm (i.e, when opportunities or resources are withheld from certain people or groups) and representational harm (i.e. when certain people or groups are stigmatized or stereotyped). Though the exact form of discrimination depends on application domain, algorithms have discriminated on the basis of race, age, sex, gender, location, socioeconomic status, and/or intersectional identity. The authors also found that discrimination based on intersection of these identities is very common but often understudied. 

Distortion: 

This paper defines distortion as “whether the algorithm presents information in a way that distorts underlying reality”. This includes political partisanship, dissemination

of misinformation, and/or hyper-personalized content that can lead to “echo chambers. Distortion is mostly relevant in search and recommendation algorithms and was one of the most common audit areas. The review found that there was small evidence of echo chambers due to these algorithms. However, they have found problems of partisan leanings, source concentration, monopoly behavior, and removal of relevant information due to hyper-personalization. 

Exploitation: 

The author defined exploitation as “whether an algorithm inappropriately uses content from other sources and/or protected and sensitive information from people”. Audits have examined exploitation in search and advertising by Google and Facebook and have found exploitation of user generated content, journalistic content, and personalized data. The author also recommends exploitation of platform power and labor as areas for future studies.

Misjudgement: 

Misjudgement is defined as the case when “algorithm makes incorrect classification or prediction.” This was studied in the context of criminal justice and advertising. The author points out that misjudgement often results in second order effects of exploitation, discrimination, and other harms. Many audits in this review focused on misjudgement without studying the second order effects and thus were limited in their findings.

Gaps and Future Work for Audits

Based on their review, the authors have identified potential areas for future work that have a high potential for public-impact but have received less attention so far. Broadly, these areas can be grouped into: algorithmic applications, audit methods, and audit impacts.

In terms of algorithmic applications, the authors found that language processing and recommendation systems are understudied – especially in terms of their impact on discrimination and distortion. The authors also found audits are frequently performed on algorithms from large organizations such as Facebook and Google. Other highly influential companies such as Twitter, TikTok, and LinkedIN, as well as smaller startups are rarely paid attention to. The audit studies have also been mostly focused in North America though this be a result of this paper’s scope being limited to the English language.

The studies reviewed in this paper mostly collected data by directly scraping while other methods such as auditing the code or researchers creating the data was not as frequent. The author expects that audit results might vary between different data collection methods. Future work should focus on different approaches as well as on replicating results from prior audits for a more robust understanding of algorithmic harms.

Lastly, the author points out a need for situating the audit within the historical and systemic power imbalances in the world. Many of the problematic behaviors and disparate impact of algorithms come from existing inequalities. The author points out a need for abolishing algorithms used to inform unethical practices rather than merely fixing their problematic behaviors. This is especially relevant in the case of exploitation and misjudgement by advertising and search algorithms, where these behaviors are incentivized by organizational scale and financial profit. 

Recommendation for future audits

While audit methodology often depends on the algorithm, context, and audit goals, this paper found some common ingredients from successful audits of public-facing systems which can guide future work. A brief overview of these recommendations is listed below:

  • The authors recommend conducting audits on public-facing and opaque algorithmic systems that can pose imminent or potential harms to the public. 
  • Audits should focus on specific problematic behaviors rather than general failure mode to have maximal impact. 
  • The outcome of the audit should be compared to a compelling baseline. In most cases, this baseline should be guided by theoretical frameworks of desired … in the world such as colonialism, reparatory justice, fairness, and feminist ethics of care. 
  • The authors found that audit metrics have a potential for being hacked or easily misinterpreted to appear to be fair. Therefore, metrics should be crafted with care and be driven from the baseline and audit goals. 

From algorithmic audits to algorithm justice

The paper concludes with a brief discussion of other tools being used for algorithmic justice within human-computer interaction research as follows:

  1. User Studies: Similar to algorithmic audits, user studies help uncover problematic algorithmic behavior based on user surveys and interviews. These can help identify new problem areas and application domains worth auditing. 
  2. Development Histories: Studies that explore the development of an algorithmic systems can help uncover their value systems and incentives which in turn can inform researchers on selecting problematic behaviors, organizations, and domains. 
  3. Non-public Audits: Along with audits of public-facing systems, audits of non-public algorithms can discover problematic behavior before they are released to the public.
  4. Case Studies: Similar to user studies, small scale case studies and comparative studies are a useful tool to surface “glitch” that points to more representational harms.

Between the lines

The meta-analysis in this paper substantially shows that harmful behaviors due to algorithms already exist in the real world in a variety of application domains. The summary and taxonomy from this study can be a good baseline to maximize the impact of future audits. However, audits and other tools discussed in this study can only help diagnose problematic behavior of algorithmic systems. For true accountability, these findings should lead to changes in the algorithms and prevention of future harms. Thus, there is a need for a similar systematic study on the impact of audit results and the best approaches for moving from diagnosing problems to preventing use of problematic algorithms. 

References

[1] https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

[2] http://gendershades.org/

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

A network diagram with lots of little emojis, organised in clusters.

Tech Futures: AI For and Against Knowledge

A brightly coloured illustration which can be viewed in any direction. It has many elements to it working together: men in suits around a table, someone in a data centre, big hands controlling the scenes and holding a phone, people in a production line. Motifs such as network diagrams and melting emojis are placed throughout the busy vignettes.

Tech Futures: The Fossil Fuels Playbook for Big Tech: Part II

A rock embedded with intricate circuit board patterns, held delicately by pale hands drawn in a ghostly style. The contrast between the rough, metallic mineral and the sleek, artificial circuit board illustrates the relationship between raw natural resources and modern technological development. The hands evoke human involvement in the extraction and manufacturing processes.

Tech Futures: The Fossil Fuels Playbook for Big Tech: Part I

Close-up of a cat sleeping on a computer keyboard

Tech Futures: The threat of AI-generated code to the world’s digital infrastructure

The undying sun hangs in the sky, as people gather around signal towers, working through their digital devices.

Dreams and Realities in Modi’s AI Impact Summit

related posts

  • Broadening the Algorithm Auditing Lens to Investigate Targeted Advertising

    Broadening the Algorithm Auditing Lens to Investigate Targeted Advertising

  • Submission to World Intellectual Property Organization on IP & AI

    Submission to World Intellectual Property Organization on IP & AI

  • NATO Artificial Intelligence Strategy

    NATO Artificial Intelligence Strategy

  • Research summary: Decolonial AI: Decolonial Theory as Sociotechnical Foresight in Artificial Intelli...

    Research summary: Decolonial AI: Decolonial Theory as Sociotechnical Foresight in Artificial Intelli...

  • Getting from Commitment to Content in AI and Data Ethics: Justice and Explainability

    Getting from Commitment to Content in AI and Data Ethics: Justice and Explainability

  • Language Models: A Guide for the Perplexed

    Language Models: A Guide for the Perplexed

  • Dating Through the Filters

    Dating Through the Filters

  • How Machine Learning Can Enhance Remote Patient Monitoring

    How Machine Learning Can Enhance Remote Patient Monitoring

  • Assessing the nature of large language models: A caution against anthropocentrism

    Assessing the nature of large language models: A caution against anthropocentrism

  • Benchmark Dataset Dynamics, Bias and Privacy Challenges in Voice Biometrics Research

    Benchmark Dataset Dynamics, Bias and Privacy Challenges in Voice Biometrics Research

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.