• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • šŸ‡«šŸ‡·
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

Research Summary: Explaining and Harnessing Adversarial Examples

June 28, 2020

Summary contributed byĀ Shannon Egan, Research Fellow at Building 21 and pursuing a master’s in physics at UBC.

*Author & link to original paper at the bottom.


Click here for the FULL summary in PDF form

(Short-form summary below)

A bemusing weakness of many supervised machine learning (ML) models, including neural networks (NNs), are adversarial examples (AEs).  AEs are inputs generated by adding a small perturbation to a correctly-classified input, causing the model to misclassify the resulting AE with high confidence.  Goodfellow et al. propose a linear explanation of AEs, in which the vulnerability of ML models to AEs is considered a by-product of their linear behaviour and high-dimensional feature space.  In other words, small perturbations on an input can alter its classification because the change in NN activation (as result of the perturbation) scales with the size of the input vector.

Identifying ways to effectively handle AEs is of interest for problems like image classification, where the input consists of intensity data for many thousands of pixels.Ā  A method of generating AEs called ā€œfast gradient sign methodā€ badly fools a maxout network, leading to a 89.4% error rate on a perturbed MNIST test set.Ā  The authors propose an ā€œadversarial trainingā€ scheme for NNs, in which an adversarial term is added to the loss function during training.Ā 

This dramatically improves the error rate of the same maxout network to 17.4% on AEs generated by the fast gradient sign method. The linear interpretation of adversarial examples suggests an approach to adversarial training which improves a model’s ability to classify AEs, and helps interpret properties of AE classification which the previously proposed nonlinearity and overfitting hypotheses do not explain.Ā 


Click here for the full summary in PDF form.

Original paper by Ian J. Goodfellow, Jonathan Shlens and Christian Szegedy: https://arxiv.org/abs/1412.6572

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

šŸ” SEARCH

Spotlight

AI Policy Corner: Texas and New York: Comparing U.S. State-Level AI Laws

What is Sovereign Artificial Intelligence?

AI Policy Corner: The Kenya National AI Strategy

AI Policy Corner: New York City Local Law 144

Canada’s Minister of AI and Digital Innovation is a Historic First. Here’s What We Recommend.

related posts

  • Low-Resource Languages Jailbreak GPT-4

    Low-Resource Languages Jailbreak GPT-4

  • Research summary: AI Mediated Exchange Theory by Xiao Ma and Taylor W. Brown

    Research summary: AI Mediated Exchange Theory by Xiao Ma and Taylor W. Brown

  • The Epistemological View: Data Ethics, Privacy & Trust on Digital Platform

    The Epistemological View: Data Ethics, Privacy & Trust on Digital Platform

  • How the TAII Framework Could Influence the Amazon's Astro Home Robot Development

    How the TAII Framework Could Influence the Amazon's Astro Home Robot Development

  • Language (Technology) is Power: A Critical Survey of ā€œBiasā€ in NLP (Research summary)

    Language (Technology) is Power: A Critical Survey of ā€œBiasā€ in NLP (Research summary)

  • Operationalising the Definition of General Purpose AI Systems: Assessing Four Approaches

    Operationalising the Definition of General Purpose AI Systems: Assessing Four Approaches

  • Research summary:  Learning to Complement Humans

    Research summary: Learning to Complement Humans

  • Conceptualizing the Relationship between AI Explanations and User Agency

    Conceptualizing the Relationship between AI Explanations and User Agency

  • The Algorithm Audit: Scoring the Algorithms That Score Us (Research Summary)

    The Algorithm Audit: Scoring the Algorithms That Score Us (Research Summary)

  • Resistance and refusal to algorithmic harms: Varieties of ā€˜knowledge projects’

    Resistance and refusal to algorithmic harms: Varieties of ā€˜knowledge projects’

Partners

  • Ā 
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • Ā© MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.