🔬 Research Summary by Umar Iqbal, an Assistant professor at Washington University in St. Louis, researching computer security and privacy.
Overview: Large language model (LLM) platforms, such as ChatGPT, have recently begun offering a plugin ecosystem to interface with third-party services on the internet. While these plugins extend the capabilities of LLM platforms, they bring several security, privacy, and safety issues. In this research, we propose a framework to systematically study and improve the security of current and future LLM platforms. We answer FAQs about the paper here: https://github.com/llm-platform-security/chatgpt-plugin-eval
LLM platforms are extending their capabilities by integrating a third-party ecosystem. These integrations are emerging mostly without systematically considering security, privacy, and safety. Considering the capabilities of LLMs, if widely deployed without these critical considerations, LLM platforms could have severe negative consequences for users.
Thus, we propose a framework that lays a foundation for LLM platform designers to analyze and improve current and future LLM platforms’ security, privacy, and safety. Our framework is a formulation of an attack taxonomy that is developed by exploring how plugins, LLM platforms, and users could leverage their capabilities and responsibilities to mount attacks against each other. We apply our framework in the context of OpenAI’s plugin ecosystem. (While we look at OpenAI, the issues have the potential to be industry-wide.) We uncover plugins that concretely demonstrate the potential for the issues that we outline in our attack taxonomy to manifest in practice. We conclude by discussing novel challenges and providing recommendations to improve the security, privacy, and safety of future LLM platforms.
Fundamental challenges in building a secure (LLM) platform
Third-party plugins may add to the long list of security, privacy, and safety concerns the research community raises about LLMs. First, plugins are developed by third-party developers and thus should not be implicitly trusted. Prior research on other computing platforms has shown that third-party integrations often raise security and privacy issues. In the case of LLM platforms, anecdotal evidence suggests that third-party plugins can launch prompt injection attacks and potentially take over LLM platforms. Second, as we observe, plugins interface with LLM platforms and users using natural language, which can have ambiguous and imprecise interpretations. For example, the natural language functionality descriptions of plugins could be interpreted too broadly or too narrowly by the LLM platform, which could cause problems. Furthermore, at least some LLM platform vendors, such as OpenAI, currently only impose modest restrictions on third-party plugins with a handful of policies — based on our analysis and anecdotal evidence found online — a frail review process.
These concerns highlight that at least some LLM platform plugin ecosystems are emerging without systematically considering security, privacy, and safety. If widely deployed without these critical considerations, such integrations could harm the users, plugins, and LLM platforms.
Securing LLM platforms
To lay a systematic foundation for secure LLM platforms and integrations, we propose a framework that current and future designers of LLM-based platforms can leverage. To develop the framework, we first formulate an extensive taxonomy of attacks by systematically and conceptually enumerating potential security, privacy, and safety issues with an LLM platform that supports third-party plugins. To that end, we survey the capabilities of plugins, users, and LLM platforms to determine the potential attacks these key stakeholders can carry against each other. We consider both attacks and methods that uniquely apply to the LLM platform plugin ecosystem, as well as attacks and methods that already exist in other computing platforms but also apply to LLM platform plugin ecosystems.
Second, to ensure that our taxonomy is informed by current reality, we investigate existing plugins to assess whether they have the potential to implement adversarial actions that we enumerate in our taxonomy. Specifically, we leveraged our developed attack taxonomy to systematically analyze the plugins hosted on OpenAI’s plugin store by reviewing their code (manifests and API specifications) and by interacting with them. When we uncovered a new attack possibility or found that a conjectured attack was infeasible, we iteratively revised our attack taxonomy.
Users are exposed to several risks
We uncover plugins that concretely demonstrate the potential for the issues that we outline in our attack taxonomy to manifest in practice. This does not necessarily mean that the plugins are malicious. However, considering the potential for attacks, our findings demonstrate that users are exposed to several risks. For example, plugins could steal their credentials, steal their chat history, hijack their interaction with the LLM platform, or trick them by masquerading as other plugins. Again, we emphasize the word “could” rather than “are”; we did not assess whether any plugins perform adversarial actions.
Between the lines
Exacerbation of NLP-related challenges
The complexity of natural language is one of the fundamental challenges in securing LLM-based platforms. In the plugin-integrated platforms we considered, natural language is used (1) by users to interact with the platform and plugins, (2) by the platform and plugins to interact with users, and (3) even by plugins to interact with the platform (e.g., through functionality descriptions) and other plugins (e.g., through instructions in API responses). Potential ambiguity and imprecision in the interpretation of natural language and the application of policies to natural language can create challenges in all of these interactions.
Interpretation of functionality defined in natural language
In conventional computing platforms, applications define their functionality through constrained programming languages without any ambiguity. In contrast, LLM platform plugins define their functionality through natural language, which can have ambiguous interpretations. For example, the LLM platform may sometimes interpret the functionality too broadly or too narrowly, both of which could cause problems (see Risks 6 and 7 as examples in the paper). Interpreting language also requires contextual awareness, i.e., plugin instructions may need to be interpreted differently in different contexts. For example, it might be okay for the LLM platform to behave a certain way while a user interacts with a plugin, but it is not okay to persist with that behavior when the plugin is not in use (see Risk 4) as an example. In summary, the key challenge for LLM platforms is to interpret plugin functionality so as not to cause ambiguity; in other words, LLM platforms must figure out mechanisms that allow them to interpret functionality similarly to the unambiguous (or, much less ambiguous) interpretation in other computing platforms.
Application of policies on natural language content
Even if LLM platforms can precisely interpret the functionality defined in natural language or if functionality is precisely defined through some other means, it will still be challenging to apply policies (e.g., content moderation) over the natural language content returned by users, plugins, or within the LLM platform. For example, there may be a mismatch between the policy interpretation by the LLM platform, users, and plugins, e.g., on what is considered personal information (see attacks in Section 4.3 of the paper, of which Appendix C.1 discusses an example). Similarly, when there is a contradiction between the policies specified by the plugin or between the policies specified by the user and the plugin, the LLM platform would need to make a preference to resolve the deadlock, which may not be in favor of users. An LLM platform may also not apply the policies retrospectively, which may diminish its impact. For example, a policy specifying that no personal data needs to be collected or shared may not apply to already collected data (see attacks in Section 4.3 of the paper, Appendix C.1.1 discusses an example).
Anticipating future LLM-based computing systems
Looking ahead, we anticipate that LLMs will be integrated into other types of platforms as well and that the plugin-integrated LLM chatbots of today are early indicators of the types of issues that might arise in the future. For example, we can anticipate that LLMs will be integrated into voice assistant platforms (such as Amazon Alexa), which already support third-party components (“skills” for Alexa). Recent work in robotics has also integrated LLMs into a “vision-language-action” model in which an LLM directly commands a physical robot. Future users may even interact with their desktop or mobile operating systems via deeply-integrated LLMs.
In all of these cases, the NLP-related challenges with the imprecision of natural language, coupled with the potential risks from untrustworthy third parties, physical world actuation, and more, will raise serious potential concerns if not proactively considered. The designers of future LLM-based computing platforms should architect their platforms to support security, privacy, and safety early rather than attempt to address issues retroactively later.