• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI’s ChatGPT Plugins

December 7, 2023

🔬 Research Summary by Umar Iqbal, an Assistant professor at Washington University in St. Louis, researching computer security and privacy.

[Original paper by Umar Iqbal (Washington University in St. Louis), Tadayoshi Kohno (University of Washington), Franziska Roesner (University of Washington)]


Overview: Large language model (LLM) platforms, such as ChatGPT, have recently begun offering a plugin ecosystem to interface with third-party services on the internet. While these plugins extend the capabilities of LLM platforms, they bring several security, privacy, and safety issues. In this research, we propose a framework to systematically study and improve the security of current and future LLM platforms. We answer FAQs about the paper here: https://github.com/llm-platform-security/chatgpt-plugin-eval


Introduction

LLM platforms are extending their capabilities by integrating a third-party ecosystem. These integrations are emerging mostly without systematically considering security, privacy, and safety. Considering the capabilities of LLMs, if widely deployed without these critical considerations, LLM platforms could have severe negative consequences for users. 

Thus, we propose a framework that lays a foundation for LLM platform designers to analyze and improve current and future LLM platforms’ security, privacy, and safety. Our framework is a formulation of an attack taxonomy that is developed by exploring how plugins, LLM platforms, and users could leverage their capabilities and responsibilities to mount attacks against each other. We apply our framework in the context of OpenAI’s plugin ecosystem. (While we look at OpenAI, the issues have the potential to be industry-wide.) We uncover plugins that concretely demonstrate the potential for the issues that we outline in our attack taxonomy to manifest in practice. We conclude by discussing novel challenges and providing recommendations to improve the security, privacy, and safety of future LLM platforms.

Key Insights

Fundamental challenges in building a secure (LLM) platform

Third-party plugins may add to the long list of security, privacy, and safety concerns the research community raises about LLMs. First, plugins are developed by third-party developers and thus should not be implicitly trusted. Prior research on other computing platforms has shown that third-party integrations often raise security and privacy issues. In the case of LLM platforms, anecdotal evidence suggests that third-party plugins can launch prompt injection attacks and potentially take over LLM platforms. Second, as we observe, plugins interface with LLM platforms and users using natural language, which can have ambiguous and imprecise interpretations. For example, the natural language functionality descriptions of plugins could be interpreted too broadly or too narrowly by the LLM platform, which could cause problems. Furthermore, at least some LLM platform vendors, such as OpenAI, currently only impose modest restrictions on third-party plugins with a handful of policies — based on our analysis and anecdotal evidence found online — a frail review process.

These concerns highlight that at least some LLM platform plugin ecosystems are emerging without systematically considering security, privacy, and safety. If widely deployed without these critical considerations, such integrations could harm the users, plugins, and LLM platforms.

Securing LLM platforms

To lay a systematic foundation for secure LLM platforms and integrations, we propose a framework that current and future designers of LLM-based platforms can leverage. To develop the framework, we first formulate an extensive taxonomy of attacks by systematically and conceptually enumerating potential security, privacy, and safety issues with an LLM platform that supports third-party plugins. To that end, we survey the capabilities of plugins, users, and LLM platforms to determine the potential attacks these key stakeholders can carry against each other. We consider both attacks and methods that uniquely apply to the LLM platform plugin ecosystem, as well as attacks and methods that already exist in other computing platforms but also apply to LLM platform plugin ecosystems. 

Second, to ensure that our taxonomy is informed by current reality, we investigate existing plugins to assess whether they have the potential to implement adversarial actions that we enumerate in our taxonomy. Specifically, we leveraged our developed attack taxonomy to systematically analyze the plugins hosted on OpenAI’s plugin store by reviewing their code (manifests and API specifications) and by interacting with them. When we uncovered a new attack possibility or found that a conjectured attack was infeasible, we iteratively revised our attack taxonomy.

Users are exposed to several risks

We uncover plugins that concretely demonstrate the potential for the issues that we outline in our attack taxonomy to manifest in practice. This does not necessarily mean that the plugins are malicious. However, considering the potential for attacks, our findings demonstrate that users are exposed to several risks. For example, plugins could steal their credentials, steal their chat history, hijack their interaction with the LLM platform, or trick them by masquerading as other plugins. Again, we emphasize the word “could” rather than “are”; we did not assess whether any plugins perform adversarial actions.

Between the lines

Exacerbation of NLP-related challenges

The complexity of natural language is one of the fundamental challenges in securing LLM-based platforms. In the plugin-integrated platforms we considered, natural language is used (1) by users to interact with the platform and plugins, (2) by the platform and plugins to interact with users, and (3) even by plugins to interact with the platform (e.g., through functionality descriptions) and other plugins (e.g., through instructions in API responses). Potential ambiguity and imprecision in the interpretation of natural language and the application of policies to natural language can create challenges in all of these interactions.

Interpretation of functionality defined in natural language

In conventional computing platforms, applications define their functionality through constrained programming languages without any ambiguity. In contrast, LLM platform plugins define their functionality through natural language, which can have ambiguous interpretations. For example, the LLM platform may sometimes interpret the functionality too broadly or too narrowly, both of which could cause problems (see Risks 6 and 7 as examples in the paper). Interpreting language also requires contextual awareness, i.e., plugin instructions may need to be interpreted differently in different contexts. For example, it might be okay for the LLM platform to behave a certain way while a user interacts with a plugin, but it is not okay to persist with that behavior when the plugin is not in use (see Risk 4) as an example. In summary, the key challenge for LLM platforms is to interpret plugin functionality so as not to cause ambiguity; in other words, LLM platforms must figure out mechanisms that allow them to interpret functionality similarly to the unambiguous (or, much less ambiguous) interpretation in other computing platforms.

Application of policies on natural language content

Even if LLM platforms can precisely interpret the functionality defined in natural language or if functionality is precisely defined through some other means, it will still be challenging to apply policies (e.g., content moderation) over the natural language content returned by users, plugins, or within the LLM platform. For example, there may be a mismatch between the policy interpretation by the LLM platform, users, and plugins, e.g., on what is considered personal information (see attacks in Section 4.3 of the paper, of which Appendix C.1 discusses an example). Similarly, when there is a contradiction between the policies specified by the plugin or between the policies specified by the user and the plugin, the LLM platform would need to make a preference to resolve the deadlock, which may not be in favor of users. An LLM platform may also not apply the policies retrospectively, which may diminish its impact. For example, a policy specifying that no personal data needs to be collected or shared may not apply to already collected data (see attacks in Section 4.3 of the paper, Appendix C.1.1 discusses an example). 

Anticipating future LLM-based computing systems

Looking ahead, we anticipate that LLMs will be integrated into other types of platforms as well and that the plugin-integrated LLM chatbots of today are early indicators of the types of issues that might arise in the future. For example, we can anticipate that LLMs will be integrated into voice assistant platforms (such as Amazon Alexa), which already support third-party components (“skills” for Alexa). Recent work in robotics has also integrated LLMs into a “vision-language-action” model in which an LLM directly commands a physical robot. Future users may even interact with their desktop or mobile operating systems via deeply-integrated LLMs. 

In all of these cases, the NLP-related challenges with the imprecision of natural language, coupled with the potential risks from untrustworthy third parties, physical world actuation, and more, will raise serious potential concerns if not proactively considered. The designers of future LLM-based computing platforms should architect their platforms to support security, privacy, and safety early rather than attempt to address issues retroactively later.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

AI Policy Corner: Frontier AI Safety Commitments, AI Seoul Summit 2024

AI Policy Corner: The Colorado State Deepfakes Act

Special Edition: Honouring the Legacy of Abhishek Gupta (1992–2024)

AI Policy Corner: The Turkish Artificial Intelligence Law Proposal

From Funding Crisis to AI Misuse: Critical Digital Rights Challenges from RightsCon 2025

related posts

  • AI Neutrality in the Spotlight: ChatGPT’s Political Biases Revisited

    AI Neutrality in the Spotlight: ChatGPT’s Political Biases Revisited

  • Disaster City Digital Twin: A Vision for Integrating Artificial and Human Intelligence for Disaster ...

    Disaster City Digital Twin: A Vision for Integrating Artificial and Human Intelligence for Disaster ...

  • AI supply chains make it easy to disavow ethical accountability

    AI supply chains make it easy to disavow ethical accountability

  • Perspectives and Approaches in AI Ethics: East Asia (Research Summary)

    Perspectives and Approaches in AI Ethics: East Asia (Research Summary)

  • Digital Sex Crime, Online Misogyny, and Digital Feminism in South Korea

    Digital Sex Crime, Online Misogyny, and Digital Feminism in South Korea

  • On the Creativity of Large Language Models

    On the Creativity of Large Language Models

  • The ethical ambiguity of AI data enrichment: Measuring gaps in research ethics norms and practices

    The ethical ambiguity of AI data enrichment: Measuring gaps in research ethics norms and practices

  • On the Impact of Machine Learning Randomness on Group Fairness

    On the Impact of Machine Learning Randomness on Group Fairness

  • A Critical Analysis of the What3Words Geocoding Algorithm

    A Critical Analysis of the What3Words Geocoding Algorithm

  • From Dance App to Political Mercenary: How disinformation on TikTok gaslights political tensions in ...

    From Dance App to Political Mercenary: How disinformation on TikTok gaslights political tensions in ...

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.