• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
    • Tech Futures
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Research Summary: Towards Evaluating the Robustness of Neural Networks

June 29, 2020

Summary contributed by Sundar Narayanan, Director at Nexdigm. Ethics and compliance professional with experience in fraud investigation, forensic accounting, anti-corruption reviews, ethics advisory and litigation support experience.

*Author & link to original paper at the bottom.


Defensive distillation is a defense proposed for hardening neural networks against adversarial examples whereby it defeats existing attack algorithms and reduces their success probability from 95% to 0.5%.

The paper is set on the broad premise of robustness of neural network to avert an adversarial attack. It lays out the two clear factors (a) Construct proofs of lower bound for robustness and (b) Demonstrate attacks for upper bound on robustness. The paper attempts to move towards the second while explaining the gaps in first (essentially the weakness of distilled networks).

The distilled network works in 4 steps, namely (1) Teach the teacher network with standard set, (2) Create a Soft label on the training set using the teacher network, (3) Train the distilled network on soft labels and (4) Test the distilled network

Defensive distillation is robust for current level of attacks, it fails against stronger attacks. The existing distilled network fails as the optimization gradients are almost always zero, resulting in both L-BFGS and FGSM (Fast Gradient Sign Method) failing to make progress and terminate.

On the other hand, the authors attempt 3 types of attacks based on the distance metrics namely L0, L2 and L∞. They find the results to be effective in the distilled network environment. The authors apply the distance metrics using three solvers gradient descent, gradient descent with momentum and ADAM

While the L0 distance metric is non-differentiable, L2 appears to be effective. L2 attempts to identify unimportant pixels in the image in each iteration resulting in inherently bringing focus to important pixels, perturbation of which will impact the classification. This also eliminates some pixels that don’t have much effect on the classifier output. L∞ replace the L2 term in the objective function with a penalty for any terms that exceed τ (initially 1, decreasing in each iteration). This prevents oscillation resulting in effective results.

These approach helps in establishing robustness and developing high-confidence adversarial examples. High-confidence adversarial examples are the ones where an adversarial example gets strongly misclassified by the original model, instead of barely changing the classification. This could be any type of misclassification (General misclassification, Targeted misclassification or source/ target misclassification). The paper also reflects that high confidence adversarial attack limits/ breaks the transferability of the adversarial attack to different models.

The following are the key takeaways the paper explores as a defense to the adversarial attack and as a step forward from distillated network approach

  • Defenders should make sure to establish robustness against the L2 distance metric
  • Demonstrate that transferability fails by constructing high-confidence adversarial examples

Original paper by Nicholas Carlini, David Wagner: https://arxiv.org/abs/1608.04644 

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

This image is a collage with a colourful Japanese vintage landscape showing a mountain, hills, flowers and other plants and a small stream. There are 3 large black data servers placed in the bottom half of the image, with a cloud of black smoke emitting from them, partly obscuring the scenery.

Tech Futures: Crafting Participatory Tech Futures

A network diagram with lots of little emojis, organised in clusters.

Tech Futures: AI For and Against Knowledge

A brightly coloured illustration which can be viewed in any direction. It has many elements to it working together: men in suits around a table, someone in a data centre, big hands controlling the scenes and holding a phone, people in a production line. Motifs such as network diagrams and melting emojis are placed throughout the busy vignettes.

Tech Futures: The Fossil Fuels Playbook for Big Tech: Part II

A rock embedded with intricate circuit board patterns, held delicately by pale hands drawn in a ghostly style. The contrast between the rough, metallic mineral and the sleek, artificial circuit board illustrates the relationship between raw natural resources and modern technological development. The hands evoke human involvement in the extraction and manufacturing processes.

Tech Futures: The Fossil Fuels Playbook for Big Tech: Part I

Close-up of a cat sleeping on a computer keyboard

Tech Futures: The threat of AI-generated code to the world’s digital infrastructure

related posts

  • Learning to Prompt in the Classroom to Understand AI Limits: A pilot study

    Learning to Prompt in the Classroom to Understand AI Limits: A pilot study

  • Universal and Transferable Adversarial Attacks on Aligned Language Models

    Universal and Transferable Adversarial Attacks on Aligned Language Models

  • A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning

    A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning

  • AI supply chains make it easy to disavow ethical accountability

    AI supply chains make it easy to disavow ethical accountability

  • Labor and Fraud on the Google Play Store: The Case of Install-Incentivizing Apps

    Labor and Fraud on the Google Play Store: The Case of Install-Incentivizing Apps

  • South Korea as a Fourth Industrial Revolution Middle Power?

    South Korea as a Fourth Industrial Revolution Middle Power?

  • Research summary:  The Flight to Safety-Critical AI

    Research summary: The Flight to Safety-Critical AI

  • Brave: what it means to be an AI Ethicist

    Brave: what it means to be an AI Ethicist

  • Towards a Framework for Human-AI Interaction Patterns in Co-Creative GAN Applications

    Towards a Framework for Human-AI Interaction Patterns in Co-Creative GAN Applications

  • Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow

    Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.