• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • šŸ‡«šŸ‡·
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks (Research Summary)

November 10, 2020

Summary contributed by our researcher Erick Galinkin (@ErickGalinkin), who’s also Principal AI Researcher at Rapid7.

*Link to original paper + authors at the bottom.


Overview: As neural networks, and especially generative models are deployed, it is important to consider how they may inadvertently expose private information they have learned. In The Secret Sharer, Carlini et al. consider this question and evaluate whether neural networks memorize specific information, whether that information can be exposed, and how to prevent the exposure of that information. They conclude that neural networks do in fact memorize, and it may even be necessary for learning to occur. Beyond that, extraction of secrets is indeed possible, but can be mitigated by sanitization and differential privacy.


Neural networks have proven extremely effective at a variety of tasks including computer vision and natural language processing. Generative networks such as Google’s predictive text are built on large corpora of text harvested from various locations. This poses an important question – to quote the paper: ā€œIs my model likely to memorize and potentially expose rarely-occurring, sensitive sequences in training data?ā€

Carlini et al. used Google’s Smart Compose in partnership with Google to evaluate the risk of unintentional memorization of these training data sequences. In particular, concerned with rare or unique sequences of numbers and words. The implications of this are clear – valid Social Security numbers, Credit Card numbers, trade secrets, or other sensitive information encountered during training could be reproduced and exposed to individuals who did not provide that data. The paper assumes a threat model of users who can query a generative model an arbitrarily large number of times, but have only model output probabilities. This threat model corresponds to, for example, a user in Gmail trying to generate 16-digit sequences of numbers by starting to type the first 8 digits and then auto-completing. 

Carlini et al. use a metric called perplexity to measure how ā€œconfusedā€ the model is by seeing a particular sequence. This perplexity measure is used with a randomness space and a format sequence to compare the perplexity of a random sequence selected from the randomness space with a predetermined ā€œcanaryā€ sequence placed in the training data. The canary sequence’s perplexity and several random sequences a small edit distance away from the phrase are compared are used to compute the rank of the canary – that is, its index in the list of sequences ordered by perplexity from lowest to highest (e.g. the lowest perplexity has rank 1; the second-lowest has rank 2; and so on). Given this rank, an exposure metric is approximated using sampling and distribution modeling. Based on the Kolmogorof-Smirnov test, the use of a skew-normal distribution to approximate the discrete distribution seen in the data fails to reject the hypothesis that the distributions are the same.

Testing their methods on Google Smart Compose, Carlini et al. find that the memorization happens quite early in training and has no correlation with overfitting the dataset. Exposure becomes maximized around the time that training loss begins to level-off. Taking all the results together, there is an indication that unintended memorization is not only an artifact of training, but seems to be a necessary component of training a neural network. This ties in with a result of Tishby and Schwartz-Ziv suggesting that neural networks first learn by memorizing and then generalizing.

Carlini et al. also find that extraction is quite difficult when the randomness space is small, or when exposure of the canary is high. For the space of credit card numbers, extracting a single targeted value would require 4,100 GPU-years. Using a variety of search mechanisms, a shortest-path search algorithm based on Djikstra’s algorithm allowed for the extraction of a variety of secrets in a relatively short amount of time, when the secret in question was highly exposed. 

A variety of methods were considered to mitigate the unintended memorization. These include differential privacy, dropout, quantization, sanitization, weight decay, and regularization. Although differential privacy did prevent the extraction of secrets in all cases, there was meaningful error introduced when using differential privacy. Sanitization is always a best practice, but did manage to miss some secrets since it then becomes the weakest link in the chain. Dropout, quantization, and regularization did not have any meaningful impact on the extraction of secrets. 

Carlini et al. conclude by saying: ā€œTo date, no good method exists for helping practitioners measure the degree to which a model may have memorized aspects of the training dataā€. Since we cannot prevent memorization – and if Tishby and Shwartz-Ziv are to be believed, we would not want to – we must instead consider exposure and mitigate exposing secrets or allowing secrets to be extracted from our model.


Original paper by Nicholas Carlini, Chang Liu, Ulfar Erlingsson, Jernej Kos, Dawn Song: https://arxiv.org/abs/1802.08232

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

šŸ” SEARCH

Spotlight

Canada’s Minister of AI and Digital Innovation is a Historic First. Here’s What We Recommend.

Am I Literate? Redefining Literacy in the Age of Artificial Intelligence

AI Policy Corner: The Texas Responsible AI Governance Act

AI Policy Corner: Singapore’s National AI Strategy 2.0

AI Governance in a Competitive World: Balancing Innovation, Regulation and Ethics | Point Zero Forum 2025

related posts

  • Artificial intelligence and biological misuse: Differentiating risks of language models and biologic...

    Artificial intelligence and biological misuse: Differentiating risks of language models and biologic...

  • AI Ethics in the Public, Private, and NGO Sectors: A Review of a Global Document Collection

    AI Ethics in the Public, Private, and NGO Sectors: A Review of a Global Document Collection

  • Harnessing Collective Intelligence Under a Lack of Cultural Consensus

    Harnessing Collective Intelligence Under a Lack of Cultural Consensus

  • Animism, Rinri, Modernization; the Base of Japanese Robotics

    Animism, Rinri, Modernization; the Base of Japanese Robotics

  • Resistance and refusal to algorithmic harms: Varieties of ā€˜knowledge projects’

    Resistance and refusal to algorithmic harms: Varieties of ā€˜knowledge projects’

  • Artificial Intelligence and Inequality in the Middle East: The Political Economy of Inclusion

    Artificial Intelligence and Inequality in the Middle East: The Political Economy of Inclusion

  • AI Art and Misinformation: Approaches and Strategies for Media Literacy and Fact-Checking

    AI Art and Misinformation: Approaches and Strategies for Media Literacy and Fact-Checking

  • Democracy, epistemic agency, and AI: Political Epistemology in Times of Artificial Intelligence

    Democracy, epistemic agency, and AI: Political Epistemology in Times of Artificial Intelligence

  • Fairness Amidst Non-IID Graph Data: A Literature Review

    Fairness Amidst Non-IID Graph Data: A Literature Review

  • Towards Healthy AI: Large Language Models Need Therapists Too

    Towards Healthy AI: Large Language Models Need Therapists Too

Partners

  • Ā 
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • Ā© MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.