• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

A survey on adversarial attacks and defences

May 26, 2022

🔬 Research Summary by Max Krueger, a consultant at Accenture with an interest in both the long and short-term implications of AI on society.

[Original paper by Anirban  Chakraborty, Manaar Alam, Vishal Dey, Anupam Chattopadhyay, Debdeep Mukhopadhyay]


Overview:  Deep learning systems are increasingly susceptible to adversarial threats. As a result, it is imperative to evaluate methods for adversarial robustness.


Introduction

Deep learning systems are increasingly susceptible to adversarial threats. With the widespread adoption of these systems in society, security poses a significant concern. Attacks on these systems can produce catastrophic failures as communities increasingly rely on input from machine intelligence to drive decision-making. This paper provides a detailed discussion on the types of attacks deep learning systems face and potential defenses against such attacks. “Adversaries can craftily manipulate legitimate inputs, which may be imperceptible to the human eye,” the report states. In other words, these attacks are not apparent and have high consequences.

Key Insights

Attack Surface

A machine learning system can be broadly categorized as four principle components: data collection, data transfer, data processing by a machine learning model, and action taken based on an output. This paradigm represents the attack surface for which an adversary may compose an attack. The authors identify three primary attack vectors:

  1. Evasion attack – The adversary tries to evade the system by adjusting malicious samples during the testing phase. Evasion attacks are the most.
  2. Poisoning attack – The adversary attempts to inject crafted data samples to poison the system, compromising the entire learning process. Poisoning attacks take place during model training.
  3. Exploratory attack – The adversary attacks a black-box model to learn as much about the underlying design as possible.

Adversary Capabilities

The capabilities of an adversary are based on the amount of knowledge available at the time of the attack. For example, if the attacker has access to the underlying dataset, the adversary can execute training phase attacks such as injecting corrupt data into the training dataset, modifying the training data, or logic corruption – physical corruption of the learning algorithm.

Adversaries may not have access to the underlying dataset but have access to the model during the testing phase. Attacks during this phase are determined by the adversary’s knowledge of the underlying model and its parameters. In a white-box attack, “an adversary has total knowledge about the model.” Including the training data distribution, complete model architecture, and hyperparameters. The adversary then alters an input to get a specific output. White-box attacks are highly effective as inputs are specifically crafted based on the known model architecture.

Black-box Attacks

The primary objective of a black-box attack is to train a local model to help craft malicious attacks on the target model. A black-box attack assumes no knowledge of the target. The authors classify black-box attacks into three categories:

  1. Non-adaptive black-box attack – An adversary can only access the model’s training data distribution. The adversary then trains a local model based on the outputs from the black-box model. The local model allows the adversary to attack it via white-box methodology then send the crafted inputs to the target model for exploitation.
  2. Adaptive black-box attack – Like a non-adaptive attack, but the adversary does not know the training data distribution. The authors state, “The adversary issues adaptive oracle queries to the target model and labels a carefully selected dataset.” This dataset is then used to train a local model and craft malicious inputs.
  3. Strict black-box attack – Like the adaptive black-box attack, but the adversary cannot alter the inputs and, therefore, cannot observe changes to the output.

Adversarial Goals

An adversary’s goals are motivated by what they may gain from an attack. The authors illustrate four primary attack goals an adversary may have when attacking a model. Adversaries may look to reduce prediction confidence, misclassify outputs, or map a specific input to an incorrect target output. The end goal of the adversary will influence the type of attack goal used during exploitation.

Defense Strategies

Defending against such attacks is extremely difficult and lacks robustness against multiple attack types. There are several defense mechanisms available to the security practitioner, such as adversarial training, gradient hiding, and defensive distillation, to name a few. The paper provides several defense mechanisms as well as the logic behind them. The primary takeaway is that no single defense mechanism can stop all attacks, and many current defenses are easily fooled. Given the relative newness of modern machine learning and its quick deployment in new environments, it will remain challenging to stop these attacks for years to come.

Between the lines

We all know that cybersecurity is a big issue and even bigger business. This paper demonstrates that the security of machine intelligence is an open and pressing question. As deep learning becomes increasingly embedded in our everyday lives, we should be genuinely concerned about protecting these systems. It doesn’t take a big imagination to envision how one might exploit these systems to cause extreme harm. A focus on developing robust algorithms with built-in adversarial robustness will mitigate the consequences of such attacks. It will also be wise to create AI red teams to test the robustness of algorithms pre and post-deployment. Deep learning is an educational process for us all, and security should be on our radar moving forward.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

AI Policy Corner: Frontier AI Safety Commitments, AI Seoul Summit 2024

AI Policy Corner: The Colorado State Deepfakes Act

Special Edition: Honouring the Legacy of Abhishek Gupta (1992–2024)

AI Policy Corner: The Turkish Artificial Intelligence Law Proposal

From Funding Crisis to AI Misuse: Critical Digital Rights Challenges from RightsCon 2025

related posts

  • The Unequal Opportunities of Large Language Models: Revealing Demographic Bias through Job Recommend...

    The Unequal Opportunities of Large Language Models: Revealing Demographic Bias through Job Recommend...

  • Rethink reporting of evaluation results in AI

    Rethink reporting of evaluation results in AI

  • Foundations for the future: institution building for the purpose of artificial intelligence governan...

    Foundations for the future: institution building for the purpose of artificial intelligence governan...

  • Who Audits the Auditors? Recommendations from a field scan of the algorithmic auditing ecosystem

    Who Audits the Auditors? Recommendations from a field scan of the algorithmic auditing ecosystem

  • Putting collective intelligence to the enforcement of the Digital Services Act

    Putting collective intelligence to the enforcement of the Digital Services Act

  • From Instructions to Intrinsic Human Values - A Survey of Alignment Goals for Big Models

    From Instructions to Intrinsic Human Values - A Survey of Alignment Goals for Big Models

  • Extensible Consent Management Architectures for Data Trusts

    Extensible Consent Management Architectures for Data Trusts

  • AI Ethics: Enter the Dragon!

    AI Ethics: Enter the Dragon!

  • Algorithms Deciding the Future of Legal Decisions

    Algorithms Deciding the Future of Legal Decisions

  • Disaster City Digital Twin: A Vision for Integrating Artificial and Human Intelligence for Disaster ...

    Disaster City Digital Twin: A Vision for Integrating Artificial and Human Intelligence for Disaster ...

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.