• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks (Research Summary)

November 10, 2020

Summary contributed by our researcher Erick Galinkin (@ErickGalinkin), who’s also Principal AI Researcher at Rapid7.

*Link to original paper + authors at the bottom.


Overview: As neural networks, and especially generative models are deployed, it is important to consider how they may inadvertently expose private information they have learned. In The Secret Sharer, Carlini et al. consider this question and evaluate whether neural networks memorize specific information, whether that information can be exposed, and how to prevent the exposure of that information. They conclude that neural networks do in fact memorize, and it may even be necessary for learning to occur. Beyond that, extraction of secrets is indeed possible, but can be mitigated by sanitization and differential privacy.


Neural networks have proven extremely effective at a variety of tasks including computer vision and natural language processing. Generative networks such as Google’s predictive text are built on large corpora of text harvested from various locations. This poses an important question – to quote the paper: “Is my model likely to memorize and potentially expose rarely-occurring, sensitive sequences in training data?”

Carlini et al. used Google’s Smart Compose in partnership with Google to evaluate the risk of unintentional memorization of these training data sequences. In particular, concerned with rare or unique sequences of numbers and words. The implications of this are clear – valid Social Security numbers, Credit Card numbers, trade secrets, or other sensitive information encountered during training could be reproduced and exposed to individuals who did not provide that data. The paper assumes a threat model of users who can query a generative model an arbitrarily large number of times, but have only model output probabilities. This threat model corresponds to, for example, a user in Gmail trying to generate 16-digit sequences of numbers by starting to type the first 8 digits and then auto-completing. 

Carlini et al. use a metric called perplexity to measure how “confused” the model is by seeing a particular sequence. This perplexity measure is used with a randomness space and a format sequence to compare the perplexity of a random sequence selected from the randomness space with a predetermined “canary” sequence placed in the training data. The canary sequence’s perplexity and several random sequences a small edit distance away from the phrase are compared are used to compute the rank of the canary – that is, its index in the list of sequences ordered by perplexity from lowest to highest (e.g. the lowest perplexity has rank 1; the second-lowest has rank 2; and so on). Given this rank, an exposure metric is approximated using sampling and distribution modeling. Based on the Kolmogorof-Smirnov test, the use of a skew-normal distribution to approximate the discrete distribution seen in the data fails to reject the hypothesis that the distributions are the same.

Testing their methods on Google Smart Compose, Carlini et al. find that the memorization happens quite early in training and has no correlation with overfitting the dataset. Exposure becomes maximized around the time that training loss begins to level-off. Taking all the results together, there is an indication that unintended memorization is not only an artifact of training, but seems to be a necessary component of training a neural network. This ties in with a result of Tishby and Schwartz-Ziv suggesting that neural networks first learn by memorizing and then generalizing.

Carlini et al. also find that extraction is quite difficult when the randomness space is small, or when exposure of the canary is high. For the space of credit card numbers, extracting a single targeted value would require 4,100 GPU-years. Using a variety of search mechanisms, a shortest-path search algorithm based on Djikstra’s algorithm allowed for the extraction of a variety of secrets in a relatively short amount of time, when the secret in question was highly exposed. 

A variety of methods were considered to mitigate the unintended memorization. These include differential privacy, dropout, quantization, sanitization, weight decay, and regularization. Although differential privacy did prevent the extraction of secrets in all cases, there was meaningful error introduced when using differential privacy. Sanitization is always a best practice, but did manage to miss some secrets since it then becomes the weakest link in the chain. Dropout, quantization, and regularization did not have any meaningful impact on the extraction of secrets. 

Carlini et al. conclude by saying: “To date, no good method exists for helping practitioners measure the degree to which a model may have memorized aspects of the training data”. Since we cannot prevent memorization – and if Tishby and Shwartz-Ziv are to be believed, we would not want to – we must instead consider exposure and mitigate exposing secrets or allowing secrets to be extracted from our model.


Original paper by Nicholas Carlini, Chang Liu, Ulfar Erlingsson, Jernej Kos, Dawn Song: https://arxiv.org/abs/1802.08232

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

AI Policy Corner: Frontier AI Safety Commitments, AI Seoul Summit 2024

AI Policy Corner: The Colorado State Deepfakes Act

Special Edition: Honouring the Legacy of Abhishek Gupta (1992–2024)

AI Policy Corner: The Turkish Artificial Intelligence Law Proposal

From Funding Crisis to AI Misuse: Critical Digital Rights Challenges from RightsCon 2025

related posts

  • The GPTJudge: Justice in a Generative AI World

    The GPTJudge: Justice in a Generative AI World

  • Counterfactual Explanations via Locally-guided Sequential Algorithmic Recourse

    Counterfactual Explanations via Locally-guided Sequential Algorithmic Recourse

  • Defining organizational AI governance

    Defining organizational AI governance

  • Examining the Impact of Provenance-Enabled Media on Trust and Accuracy Perceptions

    Examining the Impact of Provenance-Enabled Media on Trust and Accuracy Perceptions

  • Post-Mortem Privacy 2.0: Theory, Law and Technology

    Post-Mortem Privacy 2.0: Theory, Law and Technology

  • Research summary: Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Le...

    Research summary: Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Le...

  • Harnessing Collective Intelligence Under a Lack of Cultural Consensus

    Harnessing Collective Intelligence Under a Lack of Cultural Consensus

  • A Case for AI Safety via Law

    A Case for AI Safety via Law

  • Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

    Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

  • The Ethics of AI Value Chains: An Approach for Integrating and Expanding AI Ethics Research, Practic...

    The Ethics of AI Value Chains: An Approach for Integrating and Expanding AI Ethics Research, Practic...

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.