• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks (Research Summary)

November 10, 2020

Summary contributed by our researcher Erick Galinkin (@ErickGalinkin), who’s also Principal AI Researcher at Rapid7.

*Link to original paper + authors at the bottom.


Overview: As neural networks, and especially generative models are deployed, it is important to consider how they may inadvertently expose private information they have learned. In The Secret Sharer, Carlini et al. consider this question and evaluate whether neural networks memorize specific information, whether that information can be exposed, and how to prevent the exposure of that information. They conclude that neural networks do in fact memorize, and it may even be necessary for learning to occur. Beyond that, extraction of secrets is indeed possible, but can be mitigated by sanitization and differential privacy.


Neural networks have proven extremely effective at a variety of tasks including computer vision and natural language processing. Generative networks such as Google’s predictive text are built on large corpora of text harvested from various locations. This poses an important question – to quote the paper: “Is my model likely to memorize and potentially expose rarely-occurring, sensitive sequences in training data?”

Carlini et al. used Google’s Smart Compose in partnership with Google to evaluate the risk of unintentional memorization of these training data sequences. In particular, concerned with rare or unique sequences of numbers and words. The implications of this are clear – valid Social Security numbers, Credit Card numbers, trade secrets, or other sensitive information encountered during training could be reproduced and exposed to individuals who did not provide that data. The paper assumes a threat model of users who can query a generative model an arbitrarily large number of times, but have only model output probabilities. This threat model corresponds to, for example, a user in Gmail trying to generate 16-digit sequences of numbers by starting to type the first 8 digits and then auto-completing. 

Carlini et al. use a metric called perplexity to measure how “confused” the model is by seeing a particular sequence. This perplexity measure is used with a randomness space and a format sequence to compare the perplexity of a random sequence selected from the randomness space with a predetermined “canary” sequence placed in the training data. The canary sequence’s perplexity and several random sequences a small edit distance away from the phrase are compared are used to compute the rank of the canary – that is, its index in the list of sequences ordered by perplexity from lowest to highest (e.g. the lowest perplexity has rank 1; the second-lowest has rank 2; and so on). Given this rank, an exposure metric is approximated using sampling and distribution modeling. Based on the Kolmogorof-Smirnov test, the use of a skew-normal distribution to approximate the discrete distribution seen in the data fails to reject the hypothesis that the distributions are the same.

Testing their methods on Google Smart Compose, Carlini et al. find that the memorization happens quite early in training and has no correlation with overfitting the dataset. Exposure becomes maximized around the time that training loss begins to level-off. Taking all the results together, there is an indication that unintended memorization is not only an artifact of training, but seems to be a necessary component of training a neural network. This ties in with a result of Tishby and Schwartz-Ziv suggesting that neural networks first learn by memorizing and then generalizing.

Carlini et al. also find that extraction is quite difficult when the randomness space is small, or when exposure of the canary is high. For the space of credit card numbers, extracting a single targeted value would require 4,100 GPU-years. Using a variety of search mechanisms, a shortest-path search algorithm based on Djikstra’s algorithm allowed for the extraction of a variety of secrets in a relatively short amount of time, when the secret in question was highly exposed. 

A variety of methods were considered to mitigate the unintended memorization. These include differential privacy, dropout, quantization, sanitization, weight decay, and regularization. Although differential privacy did prevent the extraction of secrets in all cases, there was meaningful error introduced when using differential privacy. Sanitization is always a best practice, but did manage to miss some secrets since it then becomes the weakest link in the chain. Dropout, quantization, and regularization did not have any meaningful impact on the extraction of secrets. 

Carlini et al. conclude by saying: “To date, no good method exists for helping practitioners measure the degree to which a model may have memorized aspects of the training data”. Since we cannot prevent memorization – and if Tishby and Shwartz-Ziv are to be believed, we would not want to – we must instead consider exposure and mitigate exposing secrets or allowing secrets to be extracted from our model.


Original paper by Nicholas Carlini, Chang Liu, Ulfar Erlingsson, Jernej Kos, Dawn Song: https://arxiv.org/abs/1802.08232

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

ALL IN Conference 2025: Four Key Takeaways from Montreal

Beyond Dependency: The Hidden Risk of Social Comparison in Chatbot Companionship

AI Policy Corner: Restriction vs. Regulation: Comparing State Approaches to AI Mental Health Legislation

Beyond Consultation: Building Inclusive AI Governance for Canada’s Democratic Future

AI Policy Corner: U.S. Executive Order on Advancing AI Education for American Youth

related posts

  • Positive AI Economic Futures: Insight Report

    Positive AI Economic Futures: Insight Report

  • Attacking Fake News Detectors via Manipulating News Social Engagement

    Attacking Fake News Detectors via Manipulating News Social Engagement

  • Open and Linked Data Model for Carbon Footprint Scenarios

    Open and Linked Data Model for Carbon Footprint Scenarios

  • Responsible Use of Technology in Credit Reporting: White Paper

    Responsible Use of Technology in Credit Reporting: White Paper

  • Towards an Understanding of Developers' Perceptions of Transparency in Software Development: A Preli...

    Towards an Understanding of Developers' Perceptions of Transparency in Software Development: A Preli...

  • Generative AI in Writing Research Papers: A New Type of Algorithmic Bias and Uncertainty in Scholarl...

    Generative AI in Writing Research Papers: A New Type of Algorithmic Bias and Uncertainty in Scholarl...

  • SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models

    SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models

  • Brave: what it means to be an AI Ethicist

    Brave: what it means to be an AI Ethicist

  • Europe : Analysis of the Proposal for an AI Regulation

    Europe : Analysis of the Proposal for an AI Regulation

  • The Next Frontier of AI: Lower Emission Processing Using Analog Computers

    The Next Frontier of AI: Lower Emission Processing Using Analog Computers

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.