• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
    • Tech Futures
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

A survey on adversarial attacks and defences

May 26, 2022

🔬 Research Summary by Max Krueger, a consultant at Accenture with an interest in both the long and short-term implications of AI on society.

[Original paper by Anirban  Chakraborty, Manaar Alam, Vishal Dey, Anupam Chattopadhyay, Debdeep Mukhopadhyay]


Overview:  Deep learning systems are increasingly susceptible to adversarial threats. As a result, it is imperative to evaluate methods for adversarial robustness.


Introduction

Deep learning systems are increasingly susceptible to adversarial threats. With the widespread adoption of these systems in society, security poses a significant concern. Attacks on these systems can produce catastrophic failures as communities increasingly rely on input from machine intelligence to drive decision-making. This paper provides a detailed discussion on the types of attacks deep learning systems face and potential defenses against such attacks. “Adversaries can craftily manipulate legitimate inputs, which may be imperceptible to the human eye,” the report states. In other words, these attacks are not apparent and have high consequences.

Key Insights

Attack Surface

A machine learning system can be broadly categorized as four principle components: data collection, data transfer, data processing by a machine learning model, and action taken based on an output. This paradigm represents the attack surface for which an adversary may compose an attack. The authors identify three primary attack vectors:

  1. Evasion attack – The adversary tries to evade the system by adjusting malicious samples during the testing phase. Evasion attacks are the most.
  2. Poisoning attack – The adversary attempts to inject crafted data samples to poison the system, compromising the entire learning process. Poisoning attacks take place during model training.
  3. Exploratory attack – The adversary attacks a black-box model to learn as much about the underlying design as possible.

Adversary Capabilities

The capabilities of an adversary are based on the amount of knowledge available at the time of the attack. For example, if the attacker has access to the underlying dataset, the adversary can execute training phase attacks such as injecting corrupt data into the training dataset, modifying the training data, or logic corruption – physical corruption of the learning algorithm.

Adversaries may not have access to the underlying dataset but have access to the model during the testing phase. Attacks during this phase are determined by the adversary’s knowledge of the underlying model and its parameters. In a white-box attack, “an adversary has total knowledge about the model.” Including the training data distribution, complete model architecture, and hyperparameters. The adversary then alters an input to get a specific output. White-box attacks are highly effective as inputs are specifically crafted based on the known model architecture.

Black-box Attacks

The primary objective of a black-box attack is to train a local model to help craft malicious attacks on the target model. A black-box attack assumes no knowledge of the target. The authors classify black-box attacks into three categories:

  1. Non-adaptive black-box attack – An adversary can only access the model’s training data distribution. The adversary then trains a local model based on the outputs from the black-box model. The local model allows the adversary to attack it via white-box methodology then send the crafted inputs to the target model for exploitation.
  2. Adaptive black-box attack – Like a non-adaptive attack, but the adversary does not know the training data distribution. The authors state, “The adversary issues adaptive oracle queries to the target model and labels a carefully selected dataset.” This dataset is then used to train a local model and craft malicious inputs.
  3. Strict black-box attack – Like the adaptive black-box attack, but the adversary cannot alter the inputs and, therefore, cannot observe changes to the output.

Adversarial Goals

An adversary’s goals are motivated by what they may gain from an attack. The authors illustrate four primary attack goals an adversary may have when attacking a model. Adversaries may look to reduce prediction confidence, misclassify outputs, or map a specific input to an incorrect target output. The end goal of the adversary will influence the type of attack goal used during exploitation.

Defense Strategies

Defending against such attacks is extremely difficult and lacks robustness against multiple attack types. There are several defense mechanisms available to the security practitioner, such as adversarial training, gradient hiding, and defensive distillation, to name a few. The paper provides several defense mechanisms as well as the logic behind them. The primary takeaway is that no single defense mechanism can stop all attacks, and many current defenses are easily fooled. Given the relative newness of modern machine learning and its quick deployment in new environments, it will remain challenging to stop these attacks for years to come.

Between the lines

We all know that cybersecurity is a big issue and even bigger business. This paper demonstrates that the security of machine intelligence is an open and pressing question. As deep learning becomes increasingly embedded in our everyday lives, we should be genuinely concerned about protecting these systems. It doesn’t take a big imagination to envision how one might exploit these systems to cause extreme harm. A focus on developing robust algorithms with built-in adversarial robustness will mitigate the consequences of such attacks. It will also be wise to create AI red teams to test the robustness of algorithms pre and post-deployment. Deep learning is an educational process for us all, and security should be on our radar moving forward.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

Close-up of a cat sleeping on a computer keyboard

Tech Futures: The threat of AI-generated code to the world’s digital infrastructure

The undying sun hangs in the sky, as people gather around signal towers, working through their digital devices.

Dreams and Realities in Modi’s AI Impact Summit

Illustration of a coral reef ecosystem

Tech Futures: Diversity of Thought and Experience: The UN’s Scientific Panel on AI

This image shows a large white, traditional, old building. The top half of the building represents the humanities (which is symbolised by the embedded text from classic literature which is faintly shown ontop the building). The bottom section of the building is embossed with mathematical formulas to represent the sciences. The middle layer of the image is heavily pixelated. On the steps at the front of the building there is a group of scholars, wearing formal suits and tie attire, who are standing around at the enternace talking and some of them are sitting on the steps. There are two stone, statute-like hands that are stretching the building apart from the left side. In the forefront of the image, there are 8 students - which can only be seen from the back. Their graduation gowns have bright blue hoods and they all look as though they are walking towards the old building which is in the background at a distance. There are a mix of students in the foreground.

Tech Futures: Co-opting Research and Education

Agentic AI systems and algorithmic accountability: a new era of e-commerce

related posts

  • Science Communications for Explainable Artificial Intelligence

    Science Communications for Explainable Artificial Intelligence

  • Data Capitalism and the User: An Exploration of Privacy Cynicism in Germany

    Data Capitalism and the User: An Exploration of Privacy Cynicism in Germany

  • Measuring Disparate Outcomes of Content Recommendation Algorithms with Distributional Inequality Met...

    Measuring Disparate Outcomes of Content Recommendation Algorithms with Distributional Inequality Met...

  • The AI Carbon Footprint and Responsibilities of AI Scientists

    The AI Carbon Footprint and Responsibilities of AI Scientists

  • Ethics in the Software Development Process: from Codes of Conduct to Ethical Deliberation

    Ethics in the Software Development Process: from Codes of Conduct to Ethical Deliberation

  • Towards an Understanding of Developers' Perceptions of Transparency in Software Development: A Preli...

    Towards an Understanding of Developers' Perceptions of Transparency in Software Development: A Preli...

  • Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection

    Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection

  • Submission to World Intellectual Property Organization on IP & AI

    Submission to World Intellectual Property Organization on IP & AI

  • Anthropomorphic interactions with a robot and robot-like agent

    Anthropomorphic interactions with a robot and robot-like agent

  • Epistemic fragmentation poses a threat to the governance of online targeting

    Epistemic fragmentation poses a threat to the governance of online targeting

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.