• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

A survey on adversarial attacks and defences

May 26, 2022

🔬 Research Summary by Max Krueger, a consultant at Accenture with an interest in both the long and short-term implications of AI on society.

[Original paper by Anirban  Chakraborty, Manaar Alam, Vishal Dey, Anupam Chattopadhyay, Debdeep Mukhopadhyay]


Overview:  Deep learning systems are increasingly susceptible to adversarial threats. As a result, it is imperative to evaluate methods for adversarial robustness.


Introduction

Deep learning systems are increasingly susceptible to adversarial threats. With the widespread adoption of these systems in society, security poses a significant concern. Attacks on these systems can produce catastrophic failures as communities increasingly rely on input from machine intelligence to drive decision-making. This paper provides a detailed discussion on the types of attacks deep learning systems face and potential defenses against such attacks. “Adversaries can craftily manipulate legitimate inputs, which may be imperceptible to the human eye,” the report states. In other words, these attacks are not apparent and have high consequences.

Key Insights

Attack Surface

A machine learning system can be broadly categorized as four principle components: data collection, data transfer, data processing by a machine learning model, and action taken based on an output. This paradigm represents the attack surface for which an adversary may compose an attack. The authors identify three primary attack vectors:

  1. Evasion attack – The adversary tries to evade the system by adjusting malicious samples during the testing phase. Evasion attacks are the most.
  2. Poisoning attack – The adversary attempts to inject crafted data samples to poison the system, compromising the entire learning process. Poisoning attacks take place during model training.
  3. Exploratory attack – The adversary attacks a black-box model to learn as much about the underlying design as possible.

Adversary Capabilities

The capabilities of an adversary are based on the amount of knowledge available at the time of the attack. For example, if the attacker has access to the underlying dataset, the adversary can execute training phase attacks such as injecting corrupt data into the training dataset, modifying the training data, or logic corruption – physical corruption of the learning algorithm.

Adversaries may not have access to the underlying dataset but have access to the model during the testing phase. Attacks during this phase are determined by the adversary’s knowledge of the underlying model and its parameters. In a white-box attack, “an adversary has total knowledge about the model.” Including the training data distribution, complete model architecture, and hyperparameters. The adversary then alters an input to get a specific output. White-box attacks are highly effective as inputs are specifically crafted based on the known model architecture.

Black-box Attacks

The primary objective of a black-box attack is to train a local model to help craft malicious attacks on the target model. A black-box attack assumes no knowledge of the target. The authors classify black-box attacks into three categories:

  1. Non-adaptive black-box attack – An adversary can only access the model’s training data distribution. The adversary then trains a local model based on the outputs from the black-box model. The local model allows the adversary to attack it via white-box methodology then send the crafted inputs to the target model for exploitation.
  2. Adaptive black-box attack – Like a non-adaptive attack, but the adversary does not know the training data distribution. The authors state, “The adversary issues adaptive oracle queries to the target model and labels a carefully selected dataset.” This dataset is then used to train a local model and craft malicious inputs.
  3. Strict black-box attack – Like the adaptive black-box attack, but the adversary cannot alter the inputs and, therefore, cannot observe changes to the output.

Adversarial Goals

An adversary’s goals are motivated by what they may gain from an attack. The authors illustrate four primary attack goals an adversary may have when attacking a model. Adversaries may look to reduce prediction confidence, misclassify outputs, or map a specific input to an incorrect target output. The end goal of the adversary will influence the type of attack goal used during exploitation.

Defense Strategies

Defending against such attacks is extremely difficult and lacks robustness against multiple attack types. There are several defense mechanisms available to the security practitioner, such as adversarial training, gradient hiding, and defensive distillation, to name a few. The paper provides several defense mechanisms as well as the logic behind them. The primary takeaway is that no single defense mechanism can stop all attacks, and many current defenses are easily fooled. Given the relative newness of modern machine learning and its quick deployment in new environments, it will remain challenging to stop these attacks for years to come.

Between the lines

We all know that cybersecurity is a big issue and even bigger business. This paper demonstrates that the security of machine intelligence is an open and pressing question. As deep learning becomes increasingly embedded in our everyday lives, we should be genuinely concerned about protecting these systems. It doesn’t take a big imagination to envision how one might exploit these systems to cause extreme harm. A focus on developing robust algorithms with built-in adversarial robustness will mitigate the consequences of such attacks. It will also be wise to create AI red teams to test the robustness of algorithms pre and post-deployment. Deep learning is an educational process for us all, and security should be on our radar moving forward.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

Canada’s Minister of AI and Digital Innovation is a Historic First. Here’s What We Recommend.

Am I Literate? Redefining Literacy in the Age of Artificial Intelligence

AI Policy Corner: The Texas Responsible AI Governance Act

AI Policy Corner: Singapore’s National AI Strategy 2.0

AI Governance in a Competitive World: Balancing Innovation, Regulation and Ethics | Point Zero Forum 2025

related posts

  • Report on Publications Norms for Responsible AI

    Report on Publications Norms for Responsible AI

  • Towards A Unified Utilitarian Ethics Framework for Healthcare Artificial Intelligence

    Towards A Unified Utilitarian Ethics Framework for Healthcare Artificial Intelligence

  • The philosophical basis of algorithmic recourse

    The philosophical basis of algorithmic recourse

  • LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models

    LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models

  • Justice in Misinformation Detection Systems

    Justice in Misinformation Detection Systems

  • Responsible Use of Technology in Credit Reporting: White Paper

    Responsible Use of Technology in Credit Reporting: White Paper

  • RAIN Africa and MAIEI on The Future of Responsible AI in Africa (Public Consultation Summary)

    RAIN Africa and MAIEI on The Future of Responsible AI in Africa (Public Consultation Summary)

  • Unpacking Invisible Work Practices, Constraints, and Latent Power Relationships in Child Welfare thr...

    Unpacking Invisible Work Practices, Constraints, and Latent Power Relationships in Child Welfare thr...

  • Research summary: AI Governance in 2019, A Year in Review: Observations of 50 Global Experts

    Research summary: AI Governance in 2019, A Year in Review: Observations of 50 Global Experts

  • Anthropomorphization of AI: Opportunities and Risks

    Anthropomorphization of AI: Opportunities and Risks

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.