• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study

May 28, 2023

🔬 Research Summary by Marco Guerini, a researcher in Computational Linguistics and head of the Language and Dialogue Technologies group at Fondazione Bruno Kessler (FBK).

[Original paper by Serra Sinem Tekiroglu, Helena Bonaldi, Margherita Fanton, and Marco Guerini]


Overview: Many international institutions and countries are taking action to monitor, restrict, and remove online hate content. Still, results are not always satisfactory, and they are often charged with censorship. An alternative approach that emerged in recent years is based on the use of so-called Counter Narratives, which are de-escalating and fact-bound textual responses refuting the hateful messages in a non-aggressive way. In this scenario, automatization is needed to address the sheer amount of hate produced daily effectively. Therefore, we conducted a comparative study investigating the use of several AI-based Neural Language Models for an automatic counter-narrative generation to be used as a companion to NGO operators while tackling online hate.


Introduction

Hate Speech (HS) has found fertile ground in Social Media Platforms. Actions undertaken by such platforms to tackle online hatred consist in identifying possible sources of hate and removing them using content deletion, account suspension, or shadow-banning. However, these actions are often interpreted and denounced as censorship by the affected users and political groups. For this reason, such restrictions can have the opposite effect of exacerbating the hostility of the haters. An alternative strategy looming on the horizon is based on using Counter-Narratives (CNs), i.e., communicative actions used to refute hate speech through thoughtful and cogent reasons and true and fact-bound arguments. As a de-escalating measure, CNs have successfully diminished hate while preserving freedom of speech.  An example of HS and CN follows: 

HS: Jews have a secret plot to take over the world.

CN: This myth traces back to ‘The Protocols of the Learned Elders of Zion,’ which is used as proof of a Jewish conspiracy to rule the world. But the Protocols are a proven forgery, written by agents of the Russian czar in the late 19th century, that spread throughout the 20th century and continues to this day.

Key Insights

Based on their effectiveness, NGOs are employing CNs to counter online hate. Still, it is impossible to respond to all instances of hate manually. For this reason, a line of AI research based on Natural Language Processing (NLP) has recently emerged, focusing on designing systems to generate CN suggestions automatically. This study aims to compare the most recent and advanced AI-based Language Models (LM) to understand their pros and cons in generating CNs. 

Effective CN Generation experiments

In our experiments, we use various automatic metrics and manual evaluations with expert judgments to assess several LMs, representing the main categories of the model architectures and decoding methods that are currently available. We further test the robustness of the fine-tuned LMs in generating CNs for unseen targets of hate. For this study, we rely on a dataset that grants the target diversity and the CN quality we aim for. The dataset was collected with a human-in-the-loop approach and features 5k HS-CN pairs, covering several targets, including DISABLED, JEWS, LGBT+, MIGRANTS, MUSLIMS, POC, and WOMEN. 

Results show that autoregressive language models such as GPT-2 are, in general, more suited for the task, and while stochastic decoding mechanisms can generate more novel, diverse, and informative outputs, deterministic decoding is useful in scenarios where more generic and less novel (yet ‘safer’) CNs are needed. Furthermore, in out-of-target experiments, we find that the similarity of targets (e.g., JEWS and MUSLIMS as religious groups) plays a crucial role in the effectiveness of portability to new targets. We finally show a promising research direction of leveraging human corrections of LM’s outputs for building an additional automatic post-editing step to correct errors made by LMs during generation.

Between the Lines

Automating CN generation can help increase the efficiency of online hate countering while preserving freedom of speech and promoting less aggressive and hostile debates. However, the AI-based generation models we tested are not meant to be used autonomously since even the best model can still produce substandard CNs containing inappropriate or negative language. Instead, following a human-computer cooperation paradigm, we want to build models that can be helpful to NGO operators by providing them with diverse and novel CN candidates for their hate-countering activities while granting them total control over the final output.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

ALL IN Conference 2025: Four Key Takeaways from Montreal

Beyond Dependency: The Hidden Risk of Social Comparison in Chatbot Companionship

AI Policy Corner: Restriction vs. Regulation: Comparing State Approaches to AI Mental Health Legislation

Beyond Consultation: Building Inclusive AI Governance for Canada’s Democratic Future

AI Policy Corner: U.S. Executive Order on Advancing AI Education for American Youth

related posts

  • Trustworthiness of Artificial Intelligence

    Trustworthiness of Artificial Intelligence

  • Response to the AHRC and WEF regarding Responsible Innovation in AI

    Response to the AHRC and WEF regarding Responsible Innovation in AI

  • A Taxonomy of Foundation Model based Systems for Responsible-AI-by-Design

    A Taxonomy of Foundation Model based Systems for Responsible-AI-by-Design

  • Design Principles for User Interfaces in AI-Based Decision Support Systems: The Case of Explainable ...

    Design Principles for User Interfaces in AI-Based Decision Support Systems: The Case of Explainable ...

  • Augmented Datasheets for Speech Datasets and Ethical Decision-Making

    Augmented Datasheets for Speech Datasets and Ethical Decision-Making

  • From Instructions to Intrinsic Human Values - A Survey of Alignment Goals for Big Models

    From Instructions to Intrinsic Human Values - A Survey of Alignment Goals for Big Models

  • Human-AI Interactions and Societal Pitfalls

    Human-AI Interactions and Societal Pitfalls

  • The Role of Relevance in Fair Ranking

    The Role of Relevance in Fair Ranking

  • Listen to What They Say: Better Understand and Detect Online Misinformation with User Feedback

    Listen to What They Say: Better Understand and Detect Online Misinformation with User Feedback

  • AI hyped? A horizon scan of discourse on artificial intelligence in education (AIED) and development

    AI hyped? A horizon scan of discourse on artificial intelligence in education (AIED) and development

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.