• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

Handling Bias in Toxic Speech Detection: A Survey

March 14, 2022

🔬 Research summary by Sarah Masud & Tanmoy Chakraborty.

Sarah is currently a 3rd-year doctoral student at the Laboratory for Computational Social Systems (LCS2) at IIIT-Delhi. Within the broad area of social computing, her work mainly revolves around modelling hate speech detection & diffusion on the web. Tanmoy Chakraborty is an Assistant Professor of Computer Science and a Ramanujan Fellow at IIIT Delhi where he leads a research group, Laboratory for Computational Social Systems (LCS2), and heads the Infosys Centre for Artificial Intelligence. His broad research includes Natural Language Processing and Social Computing with a major focus on designing machine learning models for cyber-safety, trust and social goods.

[Original paper by Tanmay Garg, Sarah Masud, Tharun Suresh, Tanmoy Chakraborty]


Overview: When attempting to detect toxic speech*(footnote) in an automated manner, we do not want the model to modify its predictions based on the speaker’s race or gender. If the model displays such behaviour, it has acquired what is usually referred to as “unintended bias.” Adoption of such biased models in production may result in the marginalisation of the groups that they were designed to assist in the first place. The current survey puts together a systematic study of existing methods for evaluating and mitigating bias in toxicity detection. 


Introduction

The subjects of bias in machine learning models and the methods used in toxicity detection have been extensively surveyed. The authors explore the niche area of bias detection, evaluation, and mitigation applied to automated toxicity detection in the existing papers. To develop a systematic overview of various unintended biases in the literature, the authors design a taxonomy of bias based on the source of harm or downstream impact of harm. The source of harm determines where in the modelling pipeline the bias gets introduced (e.g., data collection, annotation, etc.). Meanwhile, the impact of harm tries to capture which characteristic of the end-user (race, gender, age, etc.) does the biased model discriminates against). While not mutually exclusive and exhaustive, this taxonomy provides a precise overview of the existing literature. Based on this classification, the survey then deep dives into the methods used to detect, evaluate, and mitigate these biases. In addition to discussing the traditional demographic biases, the survey also touches on intersectional, cross-geographic biases and discrimination based on psychographic preferences. 

Apart from developing a taxonomy of the various biases, the authors also develop a taxonomy for the different evaluation metrics used to study biases in toxicity detection models. The taxonomy also aims to map these bias evaluation metrics to one or more concepts of fairness that the metric is trying to improve upon.

Bias as a source of harm

Talking about the biases as a source of harm, the authors discuss the impact of complying with the used data sampling strategy on introducing biases in the toxicity datasets. Interestingly, the authors highlight how the topics and user-sets captured in the dataset significantly impact biasing the dataset than the sampling technique. Additionally, the issue of lexical and annotation biases are discussed with a particular focus on reducing the model’s confusion regarding disclosure of identity vs attack on identity.

The interplay of annotation and lexicon go hand in hand. A dataset in which the annotator’s inherent biases cause the labelling to be skewed due to explicit terms will eventually develop spurious lexical connotations. Note that no standard annotation guideline and inter-annotator agreement range exist in the area of toxic speech detection. Despite the best efforts of researchers and practitioners, what can be considered toxic is highly subjective. There are no universally adopted benchmark datasets and annotations to compare against.

Bias as a target of harm

Unfortunately, the biases in datasets and modelling for toxicity detection impact various demographic groups it should prevent the spread of toxicity against. This broadly includes the markers of race and gender. Owing to the limited availability of ground-level demographics to map against, the study of racial bias in toxicity has primarily focused on discrimination against the African-American dialects. Meanwhile, the study of gender bias has focused on binary gender. 

Even though performing gender-pronoun swapping and transfer learning from less biased datasets can help mitigate the gender bias. Manual inspection of such augmentations and extension of gender beyond binary remains unexplored for toxicity detection.

Within the scope of racial prejudice, the authors observe that priming the annotators with racial information can help reduce racial bias. Still, it is a double-edged sword that such priming can intensify the inherent biases of the annotators. The authors point out how regularising racial bias via statistical models that assume different dialects have the same conditional probability, is limited in scope and application. What is accepted and commonly used in one dialect may be frowned upon or rarely used in another.

Biases beyond demography

To initiate the conversation on bias beyond preferences like race and gender, the authors highlight the limited yet significant work in the areas of intersectional and psychographic biases. One form of intersection is the duality of race and gender. Another is to look at race, gender or both from a cross-geophagy perspective, both of which are in nascent stages of study in toxicity detection. On the other hand, political ideologies and stances are being explored in determining their impact on toxicity modelling. Despite the best efforts of researchers and practitioners, areas such as ageism, religious affiliation, socio-economic status remain underexplored markers. Depending on the geography, a combination of these could be critical for mitigating bias in toxicity detection.

Between the lines

Accounting and mitigating biases within the broad area of toxicity detection is far from a solved problem. Our biggest takeaway is that practitioners need to incorporate bias mitigation at every step of the modelling pipeline rather than looking at bias mitigation as a one-stop solution. Having discovered that biases can easily be transformed from one form to another, the authors explore concepts of “bias-shifts” in lexicon debiasing. In some cases, the presence of one bias can lead to the development of other forms of downstream harm. For example, lexical and racial biases in toxicity detection have been known to occur together due to stylistic variations of African American dialects. Thus, an end-to-end pipeline will help practitioners better monitor the side effects of dealing with an existing bias. As pointed out in the survey, sadly, the majority of existing work in detecting toxic speech and mitigations of biases in toxicity detection is focused on the English-speaking dual-gendered population. To build more coherent and robust models that can help fight toxicity at scale will require us to look at linguistic nuances that occur due to regional geographics and inclusive gender dynamics.  

Footnote:

  1. Throughout the survey, the term “toxic speech” is used as an umbrella term to refer to any form of malicious content, including but not limited to hate speech, cyberbullying, abusive speech, misogyny, sexism, offence, and obscenity.
Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

Canada’s Minister of AI and Digital Innovation is a Historic First. Here’s What We Recommend.

Am I Literate? Redefining Literacy in the Age of Artificial Intelligence

AI Policy Corner: The Texas Responsible AI Governance Act

AI Policy Corner: Singapore’s National AI Strategy 2.0

AI Governance in a Competitive World: Balancing Innovation, Regulation and Ethics | Point Zero Forum 2025

related posts

  • On Human-AI Collaboration in Artistic Performance

    On Human-AI Collaboration in Artistic Performance

  • From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting

    From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting

  • ChatGPT and the media in the Global South: How non-representative corpus in sub-Sahara Africa are en...

    ChatGPT and the media in the Global South: How non-representative corpus in sub-Sahara Africa are en...

  • Policy Brief: AI’s Promise and Peril for the U.S. Government (Research summary)

    Policy Brief: AI’s Promise and Peril for the U.S. Government (Research summary)

  • Performative Power

    Performative Power

  • Beyond the Frontier: Fairness Without Accuracy Loss

    Beyond the Frontier: Fairness Without Accuracy Loss

  • A roadmap toward empowering the labor force behind AI

    A roadmap toward empowering the labor force behind AI

  • The Impact of Artificial Intelligence on Military Defence and Security

    The Impact of Artificial Intelligence on Military Defence and Security

  • Prediction Sensitivity: Continual Audit of Counterfactual Fairness in Deployed Classifiers

    Prediction Sensitivity: Continual Audit of Counterfactual Fairness in Deployed Classifiers

  • The path toward equal performance in medical machine learning

    The path toward equal performance in medical machine learning

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.