Summary contributed by our researcher Erick Galinkin (@ErickGalinkin), who’s also Principal AI Researcher at Rapid7.
*Link to original paper + authors at the bottom.
Overview: As neural networks, and especially generative models are deployed, it is important to consider how they may inadvertently expose private information they have learned. In The Secret Sharer, Carlini et al. consider this question and evaluate whether neural networks memorize specific information, whether that information can be exposed, and how to prevent the exposure of that information. They conclude that neural networks do in fact memorize, and it may even be necessary for learning to occur. Beyond that, extraction of secrets is indeed possible, but can be mitigated by sanitization and differential privacy.
Neural networks have proven extremely effective at a variety of tasks including computer vision and natural language processing. Generative networks such as Google’s predictive text are built on large corpora of text harvested from various locations. This poses an important question – to quote the paper: “Is my model likely to memorize and potentially expose rarely-occurring, sensitive sequences in training data?”
Carlini et al. used Google’s Smart Compose in partnership with Google to evaluate the risk of unintentional memorization of these training data sequences. In particular, concerned with rare or unique sequences of numbers and words. The implications of this are clear – valid Social Security numbers, Credit Card numbers, trade secrets, or other sensitive information encountered during training could be reproduced and exposed to individuals who did not provide that data. The paper assumes a threat model of users who can query a generative model an arbitrarily large number of times, but have only model output probabilities. This threat model corresponds to, for example, a user in Gmail trying to generate 16-digit sequences of numbers by starting to type the first 8 digits and then auto-completing.
Carlini et al. use a metric called perplexity to measure how “confused” the model is by seeing a particular sequence. This perplexity measure is used with a randomness space and a format sequence to compare the perplexity of a random sequence selected from the randomness space with a predetermined “canary” sequence placed in the training data. The canary sequence’s perplexity and several random sequences a small edit distance away from the phrase are compared are used to compute the rank of the canary – that is, its index in the list of sequences ordered by perplexity from lowest to highest (e.g. the lowest perplexity has rank 1; the second-lowest has rank 2; and so on). Given this rank, an exposure metric is approximated using sampling and distribution modeling. Based on the Kolmogorof-Smirnov test, the use of a skew-normal distribution to approximate the discrete distribution seen in the data fails to reject the hypothesis that the distributions are the same.
Testing their methods on Google Smart Compose, Carlini et al. find that the memorization happens quite early in training and has no correlation with overfitting the dataset. Exposure becomes maximized around the time that training loss begins to level-off. Taking all the results together, there is an indication that unintended memorization is not only an artifact of training, but seems to be a necessary component of training a neural network. This ties in with a result of Tishby and Schwartz-Ziv suggesting that neural networks first learn by memorizing and then generalizing.
Carlini et al. also find that extraction is quite difficult when the randomness space is small, or when exposure of the canary is high. For the space of credit card numbers, extracting a single targeted value would require 4,100 GPU-years. Using a variety of search mechanisms, a shortest-path search algorithm based on Djikstra’s algorithm allowed for the extraction of a variety of secrets in a relatively short amount of time, when the secret in question was highly exposed.
A variety of methods were considered to mitigate the unintended memorization. These include differential privacy, dropout, quantization, sanitization, weight decay, and regularization. Although differential privacy did prevent the extraction of secrets in all cases, there was meaningful error introduced when using differential privacy. Sanitization is always a best practice, but did manage to miss some secrets since it then becomes the weakest link in the chain. Dropout, quantization, and regularization did not have any meaningful impact on the extraction of secrets.
Carlini et al. conclude by saying: “To date, no good method exists for helping practitioners measure the degree to which a model may have memorized aspects of the training data”. Since we cannot prevent memorization – and if Tishby and Shwartz-Ziv are to be believed, we would not want to – we must instead consider exposure and mitigate exposing secrets or allowing secrets to be extracted from our model.
Original paper by Nicholas Carlini, Chang Liu, Ulfar Erlingsson, Jernej Kos, Dawn Song: https://arxiv.org/abs/1802.08232