• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

Bias in Automated Speaker Recognition

March 11, 2022

🔬 Research summary by Wiebke Toussaint, who is completing her PhD on designing trustworthy AI systems at Delft University of Technology.

[Original paper by Wiebke Toussaint and Aaron Ding]


Overview: AI enabled voice biometrics are a hidden and prevalent form of authentication. This paper examines sources of bias in the development and evaluation practices of voice-based identification systems. The authors show that speaker verification technology performance varies significantly based on speakers’ demographic attributes and that the technology is prone to bias.


Introduction

“Alexa, you *still* don’t know my voice?!” 

So you’ve been thinking about the consequences of bias in automated face recognition and natural language processing? Facial images and text present just the tip of the iceberg of everyday applications that use personal data and AI to drive services and surveillance. Authors Wiebke Toussaint and Aaron Ding from Delft University of Technology explain that many Internet of Things technologies (think smartphones and smart speakers), but also voice-driven services like call centers, are prone to bias when your voice is used for authentication.

In this paper the authors examine the implications of bias when your voice becomes your ID. The authors highlight that bias is only a peripheral concern in the development of voice-enabled systems. Through their in-depth analysis of a popular speaker recognition challenge (SRC), the VoxCeleb SRC, the authors show that bias exists at every development stage in the speaker verification machine learning pipeline. Female speakers and non-US nationalities experience the worst performance.

To mitigate bias in voice biometrics, the authors propose 1) evaluation datasets that represent real usage scenarios, 2) evaluation metrics that consider the consequences of errors, and 3) design interventions that holistically address the various sources of bias that they have uncovered.

Your Voice as Your ID

Your voice is a product of your biology and your circumstances. It can reveal your identity, age, accent, illness, truthfulness, emotional state, stress levels and much more. Automated speaker recognition technology recognises the identity of a person from their voice. A core task in speaker recognition is speaker verification, or voice biometrics. Speaker verification serves as a security feature in voice-enabled products and services around us on billions of smartphones, smart speakers and in call centers. The authors explain that the “technology grants access to personal devices in intimate moments, but also to essential public services for vulnerable user groups”,  like senior citizens in Mexico who can use it to provide a telephonic proof of life to receive their pension.

Despite their wide-scale deployment, bias in automated speaker recognition has not been studied systematically. Speaker verification mirrors many of the development practices that have raised concerns in facial recognition, natural language processing and speech recognition technologies. Toussaint and Ding point to known sources of bias in these applications to motivate the importance of considering bias in voice biometrics. 

Sources of Bias in Voice Biometrics

Toussaint and Ding use a framework for understanding sources of harm in machine learning to study bias in the VoxCeleb Speaker Recognition Challenge, which they describe as “a well-known benchmark, [that] has received several hundred submissions over the past three years.” The framework considers aggregation, learning, evaluation and deployment bias in model building and implementation; and historical, representation and measurement bias in data generation. Informed by empirical and analytical evidence, the researchers identify six sources of bias in voice biometrics.

1. Aggregation Bias

Benchmark models have disparate performance across subgroups. They are fit to the dominant population in the training data (US speakers) and perform worse for females. When tuned to the system aggregate, subgroups’ operating performance has high variability.

2. Learning Bias

Toussaint and Ding compared two model architectures (optimized for performance and speed). They found both models to perform better for males. There was no relative performance difference between models for US males and females, but for other groups neither model was consistently less biased.

3. Evaluation Bias

Speaker verification evaluation is highly sensitive to the evaluation set. The study shows that all three evaluation sets of the VoxCeleb dataset suffer from representation bias on a speaker and utterance level. Commonly used error metrics further amplify evaluation bias.

4. Deployment Bias

Speaker verification was first developed for intelligence, defense and justice objectives. The authors highlight that current evaluation practices do not account for potential harms due to social or even economic exclusion in new applications like voice assistants and call centers. Furthermore, speaker verification systems are calibrated to an operating threshold. This gives rise to threshold bias, a form of aggregation bias. All subgroups, but especially females, will experience better performance if calibrated to their own optimal threshold, not the aggregate.

5. Historic Bias

The VoxCeleb dataset was constructed from YouTube videos with a fully automated data processing pipeline. The data pipeline directly translates bias in facial recognition systems into speaker verification, as failures in the former will result in speaker exclusion from the VoxCeleb dataset.

6. Representation Bias

VoxCeleb1 is skewed towards males, US nationals and speakers between ages 20 to 50. Being a celebrity dataset, it is not representative of the broad public and likely to contain representation bias that affect many other sensitive speaker attributes.

Between the lines

Bias is a recognised challenge in AI systems that the AI community has been grappling with for several years. Toussaint and Ding shine light on bias in a yet unexplored domain – voice authentication. While research on mitigating bias has largely focused on improving AI algorithms, the authors look beyond algorithms in their recommendations for mitigating bias in voice biometrics. They suggest that:

  1. Evaluation datasets should reflect real usage scenarios
  2. Evaluation metrics should consider the consequences of errors
  3. Engineering interventions can contribute significantly to mitigating bias 

Toussaint and Ding’s research is positioned at the intersection of multiple disciplines – trustworthy AI, speech systems, biometrics and the emerging field of on-device machine learning. It demonstrates that interdisciplinary approaches and holistic design are necessary to make progress towards fairer AI systems, and lays the groundwork for AI auditing in voice biometrics.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

Canada’s Minister of AI and Digital Innovation is a Historic First. Here’s What We Recommend.

Am I Literate? Redefining Literacy in the Age of Artificial Intelligence

AI Policy Corner: The Texas Responsible AI Governance Act

AI Policy Corner: Singapore’s National AI Strategy 2.0

AI Governance in a Competitive World: Balancing Innovation, Regulation and Ethics | Point Zero Forum 2025

related posts

  • LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins

    LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins

  • The Unequal Opportunities of Large Language Models: Revealing Demographic Bias through Job Recommend...

    The Unequal Opportunities of Large Language Models: Revealing Demographic Bias through Job Recommend...

  • Research summary: Algorithmic Colonization of Africa

    Research summary: Algorithmic Colonization of Africa

  • Predatory Medicine: Exploring and Measuring the Vulnerability of Medical AI to Predatory Science

    Predatory Medicine: Exploring and Measuring the Vulnerability of Medical AI to Predatory Science

  • 6 Ways Machine Learning Threatens Social Justice

    6 Ways Machine Learning Threatens Social Justice

  • Between a Rock and a Hard Place: Freedom, Flexibility, Precarity and Vulnerability in the Gig Econom...

    Between a Rock and a Hard Place: Freedom, Flexibility, Precarity and Vulnerability in the Gig Econom...

  • A Sequentially Fair Mechanism for Multiple Sensitive Attributes

    A Sequentially Fair Mechanism for Multiple Sensitive Attributes

  • Research summary: Comparing Privacy Law GDPR Vs CCPA

    Research summary: Comparing Privacy Law GDPR Vs CCPA

  • A Lesson From AI: Ethics Is Not an Imitation Game

    A Lesson From AI: Ethics Is Not an Imitation Game

  • Research summary: AI Governance: A Holistic Approach to Implement Ethics in AI

    Research summary: AI Governance: A Holistic Approach to Implement Ethics in AI

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.