• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
    • Tech Futures
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks (Research Summary)

November 10, 2020

Summary contributed by our researcher Erick Galinkin (@ErickGalinkin), who’s also Principal AI Researcher at Rapid7.

*Link to original paper + authors at the bottom.


Overview: As neural networks, and especially generative models are deployed, it is important to consider how they may inadvertently expose private information they have learned. In The Secret Sharer, Carlini et al. consider this question and evaluate whether neural networks memorize specific information, whether that information can be exposed, and how to prevent the exposure of that information. They conclude that neural networks do in fact memorize, and it may even be necessary for learning to occur. Beyond that, extraction of secrets is indeed possible, but can be mitigated by sanitization and differential privacy.


Neural networks have proven extremely effective at a variety of tasks including computer vision and natural language processing. Generative networks such as Google’s predictive text are built on large corpora of text harvested from various locations. This poses an important question – to quote the paper: “Is my model likely to memorize and potentially expose rarely-occurring, sensitive sequences in training data?”

Carlini et al. used Google’s Smart Compose in partnership with Google to evaluate the risk of unintentional memorization of these training data sequences. In particular, concerned with rare or unique sequences of numbers and words. The implications of this are clear – valid Social Security numbers, Credit Card numbers, trade secrets, or other sensitive information encountered during training could be reproduced and exposed to individuals who did not provide that data. The paper assumes a threat model of users who can query a generative model an arbitrarily large number of times, but have only model output probabilities. This threat model corresponds to, for example, a user in Gmail trying to generate 16-digit sequences of numbers by starting to type the first 8 digits and then auto-completing. 

Carlini et al. use a metric called perplexity to measure how “confused” the model is by seeing a particular sequence. This perplexity measure is used with a randomness space and a format sequence to compare the perplexity of a random sequence selected from the randomness space with a predetermined “canary” sequence placed in the training data. The canary sequence’s perplexity and several random sequences a small edit distance away from the phrase are compared are used to compute the rank of the canary – that is, its index in the list of sequences ordered by perplexity from lowest to highest (e.g. the lowest perplexity has rank 1; the second-lowest has rank 2; and so on). Given this rank, an exposure metric is approximated using sampling and distribution modeling. Based on the Kolmogorof-Smirnov test, the use of a skew-normal distribution to approximate the discrete distribution seen in the data fails to reject the hypothesis that the distributions are the same.

Testing their methods on Google Smart Compose, Carlini et al. find that the memorization happens quite early in training and has no correlation with overfitting the dataset. Exposure becomes maximized around the time that training loss begins to level-off. Taking all the results together, there is an indication that unintended memorization is not only an artifact of training, but seems to be a necessary component of training a neural network. This ties in with a result of Tishby and Schwartz-Ziv suggesting that neural networks first learn by memorizing and then generalizing.

Carlini et al. also find that extraction is quite difficult when the randomness space is small, or when exposure of the canary is high. For the space of credit card numbers, extracting a single targeted value would require 4,100 GPU-years. Using a variety of search mechanisms, a shortest-path search algorithm based on Djikstra’s algorithm allowed for the extraction of a variety of secrets in a relatively short amount of time, when the secret in question was highly exposed. 

A variety of methods were considered to mitigate the unintended memorization. These include differential privacy, dropout, quantization, sanitization, weight decay, and regularization. Although differential privacy did prevent the extraction of secrets in all cases, there was meaningful error introduced when using differential privacy. Sanitization is always a best practice, but did manage to miss some secrets since it then becomes the weakest link in the chain. Dropout, quantization, and regularization did not have any meaningful impact on the extraction of secrets. 

Carlini et al. conclude by saying: “To date, no good method exists for helping practitioners measure the degree to which a model may have memorized aspects of the training data”. Since we cannot prevent memorization – and if Tishby and Shwartz-Ziv are to be believed, we would not want to – we must instead consider exposure and mitigate exposing secrets or allowing secrets to be extracted from our model.


Original paper by Nicholas Carlini, Chang Liu, Ulfar Erlingsson, Jernej Kos, Dawn Song: https://arxiv.org/abs/1802.08232

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

A network diagram with lots of little emojis, organised in clusters.

Tech Futures: AI For and Against Knowledge

A brightly coloured illustration which can be viewed in any direction. It has many elements to it working together: men in suits around a table, someone in a data centre, big hands controlling the scenes and holding a phone, people in a production line. Motifs such as network diagrams and melting emojis are placed throughout the busy vignettes.

Tech Futures: The Fossil Fuels Playbook for Big Tech: Part II

A rock embedded with intricate circuit board patterns, held delicately by pale hands drawn in a ghostly style. The contrast between the rough, metallic mineral and the sleek, artificial circuit board illustrates the relationship between raw natural resources and modern technological development. The hands evoke human involvement in the extraction and manufacturing processes.

Tech Futures: The Fossil Fuels Playbook for Big Tech: Part I

Close-up of a cat sleeping on a computer keyboard

Tech Futures: The threat of AI-generated code to the world’s digital infrastructure

The undying sun hangs in the sky, as people gather around signal towers, working through their digital devices.

Dreams and Realities in Modi’s AI Impact Summit

related posts

  • Hiring Algorithms Based on Junk Science May Cost You Your Dream Job

    Hiring Algorithms Based on Junk Science May Cost You Your Dream Job

  • SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models

    SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models

  • Extensible Consent Management Architectures for Data Trusts

    Extensible Consent Management Architectures for Data Trusts

  • Language Models: A Guide for the Perplexed

    Language Models: A Guide for the Perplexed

  • An Uncommon Task: participator Design in Legal AI

    An Uncommon Task: participator Design in Legal AI

  • Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View

    Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View

  • The Epistemological View: Data Ethics, Privacy & Trust on Digital Platform

    The Epistemological View: Data Ethics, Privacy & Trust on Digital Platform

  • Bridging Systems: Open Problems for Countering Destructive Divisiveness Across Ranking, Recommenders...

    Bridging Systems: Open Problems for Countering Destructive Divisiveness Across Ranking, Recommenders...

  • Mind your Language (Model): Fact-Checking LLMs and their Role in NLP Research and Practice

    Mind your Language (Model): Fact-Checking LLMs and their Role in NLP Research and Practice

  • Computer Vision’s implications for human autonomy

    Computer Vision’s implications for human autonomy

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.