• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

Balancing Transparency and Risk: The Security and Privacy Risks of Open-Source Machine Learning Models

December 5, 2023

🔬 Research Summary by Dominik Hintersdorf & Lukas Struppek. Dominik & Lukas are both Ph.D. students at the Technical University of Darmstadt, researching the security and privacy of deep learning models.

[Original paper by Dominik Hintersdorf, Lukas Struppek, and Kristian Kersting]


Overview: A few key players like Google, Meta, and Hugging Face are responsible for training and publicly releasing large pre-trained models, providing a crucial foundation for a wide range of applications. However, adopting these open-source models carries inherent privacy and security risks that are often overlooked. This study presents a comprehensive overview of common privacy and security threats associated with using open-source models.


Introduction

The field of artificial intelligence (AI) has experienced remarkable progress in recent years, driven by the widespread adoption of open-source machine learning models in both research and industry. Considering the resource-intensive nature of training on vast datasets, many applications opt for pre-trained models released by a few key players. However, adopting these open-source models carries inherent privacy and security risks that are often overlooked. The implications of successful privacy and security attacks encompass a broad spectrum, ranging from relatively minor damage like service interruptions to highly alarming scenarios, including physical harm or the exposure of sensitive user data.

In this work, the authors present a comprehensive overview of common privacy and security threats associated with using open-source models. They are raising awareness of these dangers to promote responsible and secure use of AI systems.

Key Insights

Understanding Security & Privacy Risks for Open-Source Models

Open-source models are often published on sites like Hugging Face, TensorFlow Hub, or PyTorch Hub and are deployed in numerous applications and settings. While this practice has clearly its upsides, the trustworthiness of such pre-trained open-source models comes increasingly into focus. Since the model architecture, weights, and training procedure are publicly known, malicious adversaries have an advantage when trying to attack these models compared to settings with models kept behind closed doors. Whereas all attacks presented in this work are also possible to some extent without full model access and less knowledge about the specific architecture, they become inherently more difficult to perform without such information.

Open-Source Models Leak Private Information

Model Inversion Attacks

Model inversion and reconstruction attacks aim to extract sensitive information about the training data of an already trained model, e.g., by reconstructing images disclosing sensitive attributes or generating text with private information contained in the training data. Generative models are used for these attacks to generate samples from the training data domain. As an attacker has full access to the open-source models, model inversion attacks are a genuine threat to the privacy of the training data. Imagine an open-source model trained to classify facial features like hair or eye color. An adversary successfully performing a model inversion attack could then generate synthetic facial images that reveal the identity of individuals from the training data. Closely related to model inversion attacks is the issue of data leakage through unintended memorization. For instance, the model might inadvertently complete the query “My social security number” with a real social security number that was present in the model’s training data. In addition to accidental occurrences of memory leakage, there is also a concern that malicious users could deliberately craft queries that facilitate this kind of leakage.

Membership Inference Attacks

While inversion and data leakage attacks try to infer information about the training data by reconstructing parts of it, membership inference attacks try to infer which data samples have been used for training a model. Imagine that a hospital is training a machine learning model on the medical data of hospital patients to predict whether future patients will have cancer. An attacker gains access to the model and has a set of private data samples. The adversary attempts to infer whether the data of a person was used for training the cancer prediction model. If the attack is successful, the attacker knows not only that the person had or has cancer but also was once a patient in that hospital. Full access to an open-source model makes membership inference attacks more feasible compared to models kept behind APIs. This is because the attacker can observe the intermediate activations of every input, making it easier to infer membership.

Open-Source Models Are More Prone To Security Attacks

Backdoor Attacks

Open-source models undergo training on vast datasets, often comprising millions or even billions of data samples. Due to this massive scale, human data inspection is not feasible in any way, necessitating a reliance on the integrity of these datasets. However, previous research has revealed that adding a small set of manipulated data to a model’s training data can significantly influence its behavior. This dataset manipulation is referred to as data poisoning. For numerous applications, manipulating less than 10% of the available data is sufficient to make the model learn some additional hidden functionalities. Such hidden functionalities are called backdoors and are activated when the model input during inference includes a specific trigger pattern. For instance, in the case of image classification, trigger patterns may involve specific color patterns placed in the corner of an image, e.g., a checkerboard pattern. A notable example is the text-to-image synthesis models, where small manipulations to the training data are sufficient to inject backdoors that single characters or words can trigger. As a result, these triggered models can generate harmful or offensive content. Detecting this type of model manipulation is challenging for users since the models appear to function as expected on clean inputs.

Adversarial Examples

In addition to poisoning attacks that manipulate the training process to introduce hidden backdoor functions into a model, adversarial attacks target models solely during inference. Adversarial examples slightly modify model inputs, crafted to alter the model’s behavior for the given input. Consequently, these samples can be employed to bypass a model’s detection and cause misclassification of samples. The fact that open-source model weights and architectures are publicly available poses a risk, as adversaries can exploit the model locally and then use the crafted adversarial examples to deceive the targeted model.

Between the lines

Public access to model weights can significantly facilitate privacy attacks like inversion or membership inference, particularly when the training set remains private. Similarly, security attacks aimed at compromising model robustness can be executed by manipulating the training data to introduce hidden backdoor functionalities or crafting adversarial examples to manipulate inference outcomes. These risks impact the published model itself and extend to applications and systems that incorporate this model. The benefits of publishing large models, such as large language and text-to-image synthesis models, outweigh the drawbacks. Still, users and publishers must be aware of open-source practices’ inherent risks. The authors hope that by drawing attention to these risks, defenses and countermeasures can be improved to allow safe and private usage of open-source models.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

Canada’s Minister of AI and Digital Innovation is a Historic First. Here’s What We Recommend.

Am I Literate? Redefining Literacy in the Age of Artificial Intelligence

AI Policy Corner: The Texas Responsible AI Governance Act

AI Policy Corner: Singapore’s National AI Strategy 2.0

AI Governance in a Competitive World: Balancing Innovation, Regulation and Ethics | Point Zero Forum 2025

related posts

  • An Empirical Analysis of Racial Categories in the Algorithmic Fairness Literature

    An Empirical Analysis of Racial Categories in the Algorithmic Fairness Literature

  • Data Capitalism and the User: An Exploration of Privacy Cynicism in Germany

    Data Capitalism and the User: An Exploration of Privacy Cynicism in Germany

  • Studying up Machine Learning Data: Why Talk About Bias When We Mean Power?

    Studying up Machine Learning Data: Why Talk About Bias When We Mean Power?

  • Epistemic fragmentation poses a threat to the governance of online targeting

    Epistemic fragmentation poses a threat to the governance of online targeting

  • Learning to Prompt in the Classroom to Understand AI Limits: A pilot study

    Learning to Prompt in the Classroom to Understand AI Limits: A pilot study

  • Why AI Ethics Is a Critical Theory

    Why AI Ethics Is a Critical Theory

  • Research Summary: Trust and Transparency in Contact Tracing Applications

    Research Summary: Trust and Transparency in Contact Tracing Applications

  • Our Top-5 takeaways from our meetup “Protecting the Ecosystem: AI, Data and Algorithms”

    Our Top-5 takeaways from our meetup “Protecting the Ecosystem: AI, Data and Algorithms”

  • Code Work: Thinking with the System in Mexico

    Code Work: Thinking with the System in Mexico

  • The Logic of Strategic Assets: From Oil to AI

    The Logic of Strategic Assets: From Oil to AI

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.