• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
    • Tech Futures
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study

May 28, 2023

šŸ”¬ Research Summary by Marco Guerini, a researcher in Computational Linguistics and head of the Language and Dialogue Technologies group at Fondazione Bruno Kessler (FBK).

[Original paper by Serra Sinem Tekiroglu, Helena Bonaldi, Margherita Fanton, and Marco Guerini]


Overview: Many international institutions and countries are taking action to monitor, restrict, and remove online hate content. Still, results are not always satisfactory, and they are often charged with censorship. An alternative approach that emerged in recent years is based on the use of so-called Counter Narratives, which are de-escalating and fact-bound textual responses refuting the hateful messages in a non-aggressive way. In this scenario, automatization is needed to address the sheer amount of hate produced daily effectively. Therefore, we conducted a comparative study investigating the use of several AI-based Neural Language Models for an automatic counter-narrative generation to be used as a companion toĀ NGO operators while tackling online hate.


Introduction

Hate Speech (HS) has found fertile ground in Social Media Platforms. Actions undertaken by such platforms to tackle online hatred consist in identifying possible sources of hate and removing them using content deletion, account suspension, or shadow-banning. However, these actions are often interpreted and denounced as censorship by the affected users and political groups. For this reason, such restrictions can have the opposite effect of exacerbating the hostility of the haters. An alternative strategy looming on the horizon is based on using Counter-Narratives (CNs), i.e., communicative actions used to refute hate speech through thoughtful and cogent reasons and true and fact-bound arguments. As a de-escalating measure, CNs have successfully diminished hate while preserving freedom of speech.  An example of HS and CN follows: 

HS: Jews have a secret plot to take over the world.

CN: This myth traces back to ā€˜The Protocols of the Learned Elders of Zion,’ which is used as proof of a Jewish conspiracy to rule the world. But the Protocols are a proven forgery, written by agents of the Russian czar in the late 19th century, that spread throughout the 20th century and continues to this day.

Key Insights

Based on their effectiveness, NGOs are employing CNs to counter online hate. Still, it is impossible to respond to all instances of hate manually. For this reason, a line of AI research based on Natural Language Processing (NLP) has recently emerged, focusing on designing systems to generate CN suggestions automatically. This study aims to compare the most recent and advanced AI-based Language Models (LM) to understand their pros and cons in generating CNs. 

Effective CN Generation experiments

In our experiments, we use various automatic metrics and manual evaluations with expert judgments to assess several LMs, representing the main categories of the model architectures and decoding methods that are currently available. We further test the robustness of the fine-tuned LMs in generating CNs for unseen targets of hate. For this study, we rely on a dataset that grants the target diversity and the CN quality we aim for. The dataset was collected with a human-in-the-loop approach and features 5k HS-CN pairs, covering several targets, including DISABLED, JEWS, LGBT+, MIGRANTS, MUSLIMS, POC, and WOMEN. 

Results show that autoregressive language models such as GPT-2 are, in general, more suited for the task, and while stochastic decoding mechanisms can generate more novel, diverse, and informative outputs, deterministic decoding is useful in scenarios where more generic and less novel (yet ā€˜safer’) CNs are needed. Furthermore, in out-of-target experiments, we find that the similarity of targets (e.g., JEWS and MUSLIMS as religious groups) plays a crucial role in the effectiveness of portability to new targets. We finally show a promising research direction of leveraging human corrections of LM’s outputs for building an additional automatic post-editing step to correct errors made by LMs during generation.

Between the Lines

Automating CN generation can help increase the efficiency of online hate countering while preserving freedom of speech and promoting less aggressive and hostile debates. However, the AI-based generation models we tested are not meant to be used autonomously since even the best model can still produce substandard CNs containing inappropriate or negative language. Instead, following a human-computer cooperation paradigm, we want to build models that can be helpful to NGO operators by providing them with diverse and novel CN candidates for their hate-countering activities while granting them total control over the final output.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

šŸ” SEARCH

Spotlight

Illustration of a coral reef ecosystem

Tech Futures: Diversity of Thought and Experience: The UN’s Scientific Panel on AI

This image shows a large white, traditional, old building. The top half of the building represents the humanities (which is symbolised by the embedded text from classic literature which is faintly shown ontop the building). The bottom section of the building is embossed with mathematical formulas to represent the sciences. The middle layer of the image is heavily pixelated. On the steps at the front of the building there is a group of scholars, wearing formal suits and tie attire, who are standing around at the enternace talking and some of them are sitting on the steps. There are two stone, statute-like hands that are stretching the building apart from the left side. In the forefront of the image, there are 8 students - which can only be seen from the back. Their graduation gowns have bright blue hoods and they all look as though they are walking towards the old building which is in the background at a distance. There are a mix of students in the foreground.

Tech Futures: Co-opting Research and Education

Agentic AI systems and algorithmic accountability: a new era of e-commerce

ALL IN Conference 2025: Four Key Takeaways from Montreal

Beyond Dependency: The Hidden Risk of Social Comparison in Chatbot Companionship

related posts

  • When Are Two Lists Better than One?: Benefits and Harms in Joint Decision-making

    When Are Two Lists Better than One?: Benefits and Harms in Joint Decision-making

  • Compute Trends Across Three Eras of Machine Learning

    Compute Trends Across Three Eras of Machine Learning

  • Artificial Intelligence and Inequality in the Middle East: The Political Economy of Inclusion

    Artificial Intelligence and Inequality in the Middle East: The Political Economy of Inclusion

  • Can We Engineer Ethical AI?

    Can We Engineer Ethical AI?

  • The Return on Investment in AI Ethics: A Holistic Framework

    The Return on Investment in AI Ethics: A Holistic Framework

  • On the Creativity of Large Language Models

    On the Creativity of Large Language Models

  • To Be or Not to Be Algorithm Aware: A Question of a New Digital Divide? (Research Summary)

    To Be or Not to Be Algorithm Aware: A Question of a New Digital Divide? (Research Summary)

  • Introduction To Ethical AI Principles

    Introduction To Ethical AI Principles

  • Algorithmic Harms in Child Welfare: Uncertainties in Practice, Organization, and Street-level Decisi...

    Algorithmic Harms in Child Welfare: Uncertainties in Practice, Organization, and Street-level Decisi...

  • Harmonizing Artificial Intelligence: The role of standards in the EU AI Regulation

    Harmonizing Artificial Intelligence: The role of standards in the EU AI Regulation

Partners

  • Ā 
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • Ā© 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.