• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

CRUSH: Contextually Regularized and User Anchored Self-Supervised Hate Speech Detection

June 17, 2023

🔬 Research Summary by Souvic Chakraborty, Founder & CEO @Neurals.ai, Ph.D. scholar (TCS research fellow) in the Computer Science and Engineering department at IIT Kharagpur, India, jointly supervised by Prof. Animesh Mukherjee & Prof. Pawan Goyal, working in the area of few-shot learning, self-supervision & NLP Applications.

[Original paper by Souvic Chakraborty, Parag Dutta, Sumegh Roychowdhury, and Animesh Mukherjee]


Overview:

“Put your sword back in its place,” Jesus said to him, “for all who draw the sword will die by the sword.” because “hate begets hate” – Gospel of Matthew, verse 26:52.

We draw inspiration from these words and the empirical evidence of the clustering tendency of hate speech provided in the works of Matthew et al. (2020) to design two loss functions incorporating user-anchored self-supervision and contextual regularization in hate speech. These are incorporated in the pretraining and finetuning phases to improve automatic hate speech detection in social media significantly.


Introduction

Hate speech classification is a binary (hate/non-hate) or ternary (hate/offensive/neutral) classification task, and the loss generally employed for training is cross-entropy (or mean squared error in case one is solving the regression version of the task). In addition, we employ one more loss for contextual regularization for smoother training. We also add a pretraining phase for user-anchored self-supervision, as explained below.

User Anchored Self Supervision

Basic assumption: 

Hateful users are distinct from non-hateful users in their language use.

Deduction: 

Embeddings of sentences from hateful users should cluster together but be distant from the embeddings of sentences from non-hateful users and vice versa.

Operationalization: 

Embeddings of sentences (derived from BERT – also retrained in this phase as part of the end-to-end architecture) from the same user should cluster together but be distant from the embeddings of sentences from other users.

We do not need any labeled data for this phase. It is possible to collect extensive social media data and train models on it. In our case, we train the UA phase of our model on GAB and Reddit’s corpus.

Making the pretraining phase robust

Aligning the pre-training objective with the fine-tuning objective is crucial. To do this better, we introduce a variant of supervised contrastive loss (Khosla et al. (2021)) to regularize the UA phase bringing embeddings of hateful comments closer while facilitating distance from the embeddings of the non-hateful comments. A convex combination of both losses has been used as the final loss function to learn the embedding space better. 

Contextual regularization

Our central assumption in this phase is that if we find a random occurrence of hate speech, the sentences in the vicinity of that hate speech (either sentence uttered by the same user or sentences in the same thread) are also likely to be hate speech.

We soft annotate hate speech based on the vicinity to other sentences annotated as hate speech. Thus, we use these soft-annotated examples to regularize the gold-level data using an additional loss function.

Results:

We see improvements in both UA and CR phases. In addition,  pretraining on Masked Language Modelling loss has helped our model. Most importantly, our model has significantly outperformed BERT models with additional annotated data (HateXplain) without needing those annotations.

Fewshot improvements

We repeat the experiments using only a part of the data in training for all competing algorithms. We see that our improvements are even more significant in a low-resource setting with only a few examples in hand. This shows the strength of our representation learning. Our pretraining method could cluster hateful sentences better than vanilla language models, commanding better accuracy with fewer annotated training samples in the finetuning stage for all three datasets.

Between the lines 

  1. Self-supervised user attribution pre-training objective combined with the contextual regularization objective on top of traditional MLM for hate speech tasks. We empirically demonstrate the advantage of our methods by improving over existing competitive baselines in hate speech detection and scoring tasks across three different datasets.
  2. CRUSH performs superior in the low-data regime compared to existing approaches. Ablations show the benefits of each objective separately.
  3. Future works include
    1. exploiting the relations among users, 
    2. using different base models capable of incorporating more extended contexts, 
    3. address challenging problems like sarcasm and implicit hate speech detection in social networks.

Is the task solved?

Our qualitative analysis shows that the task is yet far from solved despite using domain knowledge-based self-supervision on a large text corpus. While texts containing cuss words are easier to be classified, convoluted sentences and sarcastic utterances are still not always classified correctly.

Moreover, grouping texts by users may carry the inherent risk of grouping texts by communities/ethnicities, as some communities can share unique linguistic styles. This has the potential to introduce bias in hate speech classification. But this problem is present for any language model pretrained on social media corpus. So, we need to develop methods in the future to reduce that bias in the representation learning phase. So, the path to better hate speech classification is still full of challenges and excitement!

References

  1. Souvic Chakraborty, Parag Dutta, Sumegh Roychowdhury, and Animesh Mukherjee. 2022. CRUSH: Contextually Regularized and User anchored Self-supervised Hate speech Detection. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1874–1886, Seattle, United States. Association for Computational Linguistics.
  2. Binny Mathew, Anurag Illendula, Punyajoy Saha, Soumya Sarkar, Pawan Goyal, and Animesh Mukherjee. 2020. Hate begets Hate: A Temporal Study of Hate Speech. Proc. ACM Hum.-Comput. Interact. 4, CSCW2, Article 92 (October 2020), 24 pages.
  3. Khosla, Prannay, et al. “Supervised contrastive learning.” Advances in Neural Information Processing Systems 33 (2020): 18661-18673.
  4. Kenton, Jacob Devlin Ming-Wei Chang, and Lee Kristina Toutanova. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” Proceedings of NAACL-HLT. 2019.
Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

AI Policy Corner: Singapore’s National AI Strategy 2.0

AI Governance in a Competitive World: Balancing Innovation, Regulation and Ethics | Point Zero Forum 2025

AI Policy Corner: Frontier AI Safety Commitments, AI Seoul Summit 2024

AI Policy Corner: The Colorado State Deepfakes Act

Special Edition: Honouring the Legacy of Abhishek Gupta (1992–2024)

related posts

  • Distributed Governance: a Principal-Agent Approach to Data Governance - Part 1 Background & Core Def...

    Distributed Governance: a Principal-Agent Approach to Data Governance - Part 1 Background & Core Def...

  • Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical ...

    Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical ...

  • The Ethics of AI Business Practices: A Review of 47 AI Ethics Guidelines

    The Ethics of AI Business Practices: A Review of 47 AI Ethics Guidelines

  • How the TAII Framework Could Influence the Amazon's Astro Home Robot Development

    How the TAII Framework Could Influence the Amazon's Astro Home Robot Development

  • Melting contestation: insurance fairness and machine learning

    Melting contestation: insurance fairness and machine learning

  • Submission to World Intellectual Property Organization on IP & AI

    Submission to World Intellectual Property Organization on IP & AI

  • Post-Mortem Privacy 2.0: Theory, Law and Technology

    Post-Mortem Privacy 2.0: Theory, Law and Technology

  • Unpacking Human-AI interaction (HAII) in safety-critical industries

    Unpacking Human-AI interaction (HAII) in safety-critical industries

  • Group Fairness Is Not Derivable From Justice: a Mathematical Proof

    Group Fairness Is Not Derivable From Justice: a Mathematical Proof

  • Towards User-Guided Actionable Recourse

    Towards User-Guided Actionable Recourse

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.