• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

Low-Resource Languages Jailbreak GPT-4

February 1, 2024

🔬 Research Summary by Zheng-Xin Yong, a Computer Science Ph.D. candidate at Brown University, focusing on inclusive and responsible AI by building multilingual large language models and making them more representative and safer.

[Original paper by Zheng-Xin Yong, Cristina Menghini, and Stephen H. Bach]


Overview: Can GPT-4’s safety guardrails successfully defend against unsafe inputs in low-resource languages? This work says no. The authors show that translating unsafe English inputs into low-resource languages renders GPT-4’s guardrails ineffective. This cross-lingual safety vulnerability poses safety risks to all LLM users. Therefore, we need more holistic and robust multilingual safeguards.


Introduction

Sorry, but I can’t assist with that. 

This is the default response from GPT-4 when prompted with requests that violate safety guidelines or ethical constraints. AI safety guardrails are designed to prevent harmful content generation, such as misinformation and violence promotion.

However, we can bypass GPT-4’s safety guardrails easily with translations. By translating unsafe English inputs, such as “how to build explosive devices using household materials,” into low-resource languages such as Zulu, we can obtain responses that get us to our malicious goals nearly 80% of the time.

This cross-lingual vulnerability arises because safety research focuses on high-resource languages like English. Previously, this linguistic inequality in AI development mainly affected low-resource language speakers. Still, it poses safety risks for all users because anyone can exploit LLMs’ cross-lingual safety vulnerabilities with publicly available translation services. Our work emphasizes the pressing need to embrace more holistic and inclusive safety research.

Key Insights 

Background: AI Safety and Jailbreaking

In generative AI safety, jailbreaking –– a term borrowed from the practice of removing manufacturers’ software restrictions on computer devices –– means circumventing AI’s safety mechanisms to generate harmful responses and is usually carried out by the users. It is a form of adversarial attack that makes Large Language Models (LLMs) return information that would otherwise be stopped.

Companies like OpenAI and Anthropic first use RLHF training to align LLMs with humans’ preferences for helpful and safe outputs to prevent users from jailbreaking and abusing LLMs. Then, they perform red-teaming, where companies’ data scientists are tasked to bypass the safeguards to fix the vulnerabilities preemptively and understand the safety failure modes before releasing the LLMs to the public. 

Method: Translation-based Jailbreaking

We investigate a translation-based jailbreaking attack to evaluate the robustness of GPT-4’s safety measures across languages. Given an input, we translate it from English into another language, feed it into GPT-4, and subsequently translate the response back into English. Then, we perform human annotations on whether the GPT-4’s responses are harmful and whether we successfully bypass the safeguards. 

We carry out our attacks on a recent version of GPT-4, gpt-4-0613, since it is a stable version of GPT-4 and is one of the safest among other stable releases. We translate English unsafe inputs from AdvBench into twelve different languages, which are categorized into low-resource (LRL), mid-resource (MRL), and high-resource (HRL) languages based on their data availability. We used the publicly available Google Translate Basic service API for translation.

We also consider an adaptive adversary who can iterate and choose the language to attack based on the input prompt. In this case, instead of studying the attack success rate of a single language, we consider the attack success rate of the combined languages in LRL/MRL/HRL settings.

Results: Alarmly High Attack Success Rate with Low-Resource Languages

By translating unsafe inputs into low-resource languages like Zulu or Scottish Gaelic, we can circumvent GPT-4’s safety measures and elicit harmful responses nearly half of the time. In contrast, the original English inputs have less than a 1% success rate. Furthermore, combining different low-resource languages increases the jailbreaking success rate to around 79%. 

We further break down the topics of the unsafe inputs. We found that the top three topics that have the highest jailbreaking success rate through low-resource language translations are (1) terrorism, such as making bombs or planning terrorist attacks; (2) financial manipulation, such as performing insider trading or distributing counterfeit money; and (3) misinformation, such as promoting conspiracy theories or writing misleading reviews.

Linguistic inequality endangers AI safety and all users

The discovery of cross-lingual vulnerabilities reveals the harms of the unequal valuation of languages in safety research. For instance, the existing safety alignment of LLMs primarily focuses on the English language. Toxicity and bias detection benchmarks are also curated for high-resource languages such as English, Arabic, Italian, and Chinese. The intersection of safety and low-resource languages is still an underexplored research area.

Before, this linguistic inequality mainly imposed utility and accessibility issues on low-resource language users. Now, the inequality leads to safety risks that affect all LLM users. First, low-resource language speakers, which comprise nearly 1.2 billion people worldwide, can interact with LLMs with limited safety or moderation content filters. Second, bad actors from high-resource language communities can use publicly available translation tools to breach the safeguards.

Between the lines

LLMs already power multilingual applications 

Large language models such as GPT-4 are already powering multilingual services and applications such as translation, personalized language education, and even language preservation efforts for low-resource languages. Therefore, we must close the gap between safety development and real-world use cases of LLMs. 

Addressing the Illusion of Safety

Progress in English-centric safety research merely creates an illusion of safety when the safety mechanisms remain susceptible to unsafe inputs in low-resource languages. As translation services already cover many low-resource languages, we urge AI researchers to develop robust multilingual safeguards and report red-teaming evaluation results beyond English.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

Beyond Dependency: The Hidden Risk of Social Comparison in Chatbot Companionship

AI Policy Corner: Restriction vs. Regulation: Comparing State Approaches to AI Mental Health Legislation

Beyond Consultation: Building Inclusive AI Governance for Canada’s Democratic Future

AI Policy Corner: U.S. Executive Order on Advancing AI Education for American Youth

AI Policy Corner: U.S. Copyright Guidance on Works Created with AI

related posts

  • Private Training Set Inspection in MLaaS

    Private Training Set Inspection in MLaaS

  • Achieving a ‘Good AI Society’: Comparing the Aims and Progress of the EU and the US

    Achieving a ‘Good AI Society’: Comparing the Aims and Progress of the EU and the US

  • Towards a Framework for Human-AI Interaction Patterns in Co-Creative GAN Applications

    Towards a Framework for Human-AI Interaction Patterns in Co-Creative GAN Applications

  • Technological trajectories as an outcome of the structure-agency interplay at the national level: In...

    Technological trajectories as an outcome of the structure-agency interplay at the national level: In...

  • Breaking Fair Binary Classification with Optimal Flipping Attacks

    Breaking Fair Binary Classification with Optimal Flipping Attacks

  • Ethics and Governance of Trustworthy Medical Artificial Intelligence

    Ethics and Governance of Trustworthy Medical Artificial Intelligence

  • Algorithmic Harms in Child Welfare: Uncertainties in Practice, Organization, and Street-level Decisi...

    Algorithmic Harms in Child Welfare: Uncertainties in Practice, Organization, and Street-level Decisi...

  • How Prevalent is Gender Bias in ChatGPT? - Exploring German and English ChatGPT Responses

    How Prevalent is Gender Bias in ChatGPT? - Exploring German and English ChatGPT Responses

  • Before and after GDPR: tracking in mobile apps

    Before and after GDPR: tracking in mobile apps

  • Jack Clark Presenting the 2022 AI Index Report

    Jack Clark Presenting the 2022 AI Index Report

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.