Summary contributed by our researcher Erick Galinkin (@ErickGalinkin), who’s also Principal AI Researcher at Rapid7.
*Link to original paper + authors at the bottom.
Overview: Neural networks have shown amazing ability to learn on a variety of tasks, and this sometimes leads to unintended memorization. This paper explores how generative adversarial networks may be used to recover some of these memorized examples.
Model inversion attacks are a type of attack which abuse access to a model by attempting to infer information about the training data set. Effective model inversion attacks have largely been on extremely simple models such as linear regression and logistic regression, showing little promise in deep neural networks. However, generative adversarial networks (GANs) provide the ability to approximate these data sets.
Using techniques similar to image inpainting for obscured or damaged images, the GAN creates semantically plausible pixels based on what has been inferred about the sensitive features in the training data. A Wasserstein-GAN is used to set up a min-max problem as the loss function, and some auxiliary knowledge about the private images are provided to the attacker. This serves as an additional input to the generator. The generator then passes the recovered images to both the target network and a discriminator. The loss from both of these inferences is combined to optimize the generator.
Using facial recognition classifiers as a model, Zhang et al. find that generative model inversion is significantly more effective than existing model inversion methods. Notably, more powerful models which have more layers and parameters are more susceptible to the attack.
Zhang et al. also find that pre-training the GAN on auxiliary data from the training distribution helps recovery of private data significantly. However, even training on similar data with a different distribution – such as pre-training on the PubFig83 dataset and attacking a model trained on the CelebA dataset still outperforms existing model inversion attacks by a large margin. Some image pre-processing can further improve the accuracy of the GAN in generating target data.
Finally, Zhang et al. investigated the implications of differential privacy in recovering images. They note that differentially private facial recognition models are very difficult to produce with acceptable accuracy in the first place, due to the complexity of the task. Thus, using MNIST as a reference dataset, they find that generative model inversion can expose private information from differentially private models even with strong privacy guarantees, and the strictness of the guarantee does not impact the ability to recover data. They suggest that this is likely because “DP, in its canonical form, only hides the presence of a single instance in the training set; it does not explicitly aim to protect attribute privacy.”
Original paper by Yuheng Zhang, Ruoxi Jia, Hengzhi Pei, Wenxiao Wang, Dawn Song: https://arxiv.org/abs/1911.07135