Research Summary: Towards Evaluating the Robustness of Neural Networks

Summary contributed by Sundar Narayanan, Director at Nexdigm. Ethics and compliance professional with experience in fraud investigation, forensic accounting, anti-corruption reviews, ethics advisory and litigation support experience.

*Author & link to original paper at the bottom.

Defensive distillation is a defense proposed for hardening neural networks against adversarial examples whereby it defeats existing attack algorithms and reduces their success probability from 95% to 0.5%.

The paper is set on the broad premise of robustness of neural network to avert an adversarial attack. It lays out the two clear factors (a) Construct proofs of lower bound for robustness and (b) Demonstrate attacks for upper bound on robustness. The paper attempts to move towards the second while explaining the gaps in first (essentially the weakness of distilled networks).

The distilled network works in 4 steps, namely (1) Teach the teacher network with standard set, (2) Create a Soft label on the training set using the teacher network, (3) Train the distilled network on soft labels and (4) Test the distilled network

Defensive distillation is robust for current level of attacks, it fails against stronger attacks. The existing distilled network fails as the optimization gradients are almost always zero, resulting in both L-BFGS and FGSM (Fast Gradient Sign Method) failing to make progress and terminate.

On the other hand, the authors attempt 3 types of attacks based on the distance metrics namely L0, L2 and L∞. They find the results to be effective in the distilled network environment. The authors apply the distance metrics using three solvers gradient descent, gradient descent with momentum and ADAM

While the L0 distance metric is non-differentiable, L2 appears to be effective. L2 attempts to identify unimportant pixels in the image in each iteration resulting in inherently bringing focus to important pixels, perturbation of which will impact the classification. This also eliminates some pixels that don’t have much effect on the classifier output. L∞ replace the L2 term in the objective function with a penalty for any terms that exceed τ (initially 1, decreasing in each iteration). This prevents oscillation resulting in effective results.

These approach helps in establishing robustness and developing high-confidence adversarial examples. High-confidence adversarial examples are the ones where an adversarial example gets strongly misclassified by the original model, instead of barely changing the classification. This could be any type of misclassification (General misclassification, Targeted misclassification or source/ target misclassification). The paper also reflects that high confidence adversarial attack limits/ breaks the transferability of the adversarial attack to different models.

The following are the key takeaways the paper explores as a defense to the adversarial attack and as a step forward from distillated network approach

Defenders should make sure to establish robustness against the L2 distance metric
Demonstrate that transferability fails by constructing high-confidence adversarial examples

Original paper by Nicholas Carlini, David Wagner: https://arxiv.org/abs/1608.04644

Research Summary: Towards Evaluating the Robustness of Neural Networks

Research summary: Using Multimodal Sensing to Improve Awareness in Human-AI Interaction

Repairing Innovation - A Study of Integrating AI in Clinical Care (Research Summary)

Between a Rock and a Hard Place: Freedom, Flexibility, Precarity and Vulnerability in the Gig Econom...

Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models

The increasing footprint of facial recognition technology in Indian law enforcement - pitfalls and r...

Exploiting The Right: Inferring Ideological Alignment in Online Influence Campaigns Using Shared Ima...

Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Re...

Adding Structure to AI Harm

Data Capitalism and the User: An Exploration of Privacy Cynicism in Germany

Humans, AI, and Context: Understanding End-Users’ Trust in a Real-World Computer Vision Application

Categories

Signature Content

Learn More

The AI Ethics Brief (bi-weekly newsletter)

About Us

Archive