• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

Deployment corrections: An incident response framework for frontier AI models

January 25, 2024

🔬 Research Summary by Joe O’Brien, an Associate Researcher at the Institute for AI Policy and Strategy, focusing on corporate governance and accountability surrounding developing and deploying frontier AI models.

[Original paper by Joe O’Brien, Shaun Ee, and Zoe Williams]


Overview: This report describes a toolkit that frontier AI developers can use to respond to risks discovered after the deployment of a model. It also provides a framework for AI developers to prepare and implement this toolkit.


Introduction

Recent history features plenty of cases where AI models have behaved or been used in unintended ways after model deployment. As AI capabilities progress and the scale of adoption of AI systems grows, the impacts of model deployments may become increasingly significant–and this may especially be the case for leading AI developers, such as OpenAI, Google DeepMind, Anthropic, Microsoft, Google, Amazon, and Meta. While AI developers can adopt several safety practices before deployment (such as red-teaming, risk assessment, and fine-tuning) to reduce the likelihood of incidents, these practices are unlikely to pre-empt all potential issues.

To manage this gap, this paper recommends that leading AI developers establish the capacity for “deployment corrections”–a set of tools to rapidly restrict access to a deployed model for all or part of its functionality and/or users. This would facilitate appropriate and fast responses to a) dangerous capabilities or behaviors identified in post-deployment risk assessment and monitoring and b) serious incidents. The paper also describes practices that can lower the barrier to making decisive, appropriate decisions on deployment corrections. 

Key Insights

As a toolkit

Frontier AI developers, which make their models available to downstream users via an interface (e.g., API rather than via open-sourcing), have many tools at their disposal to limit access to the model. At a high level, this toolkit includes:

  • User-based restrictions (such as blocklisting or allowlisting)
  • Access frequency restrictions (such as throttling the number of prompts that can be submitted to a model in a time period)
  • Capability restrictions (such as filtering harmful model outputs)
  • Use case restrictions (such as prohibiting a model’s use in high-stakes applications)
  • Full shutdown (such as decommissioning a model)

These tools can be used in a broad range of scenarios, from cases where risks from the model are fairly limited to scenarios where the harms are potentially severe and can arise even from proper use by a trusted user.

Restricting model access may be difficult in practice, as downstream users may become dependent on the capabilities of newly deployed models.  To minimize these harms and to lower the barrier for developers to institute deployment corrections as a precaution, we outline a space for deployment corrections to allow a scalable and targeted approach. AI developers can opt for combinations of restrictions and tailor these choices to respond effectively to specific incidents while minimizing downstream harms.

Building organizational capacity

Tools alone are insufficient for action–AI developers will need to develop procedures, roles, and responsibilities for managing decisions around deployment corrections to respond to incidents with their deployed models most effectively. The paper recommends that AI developers focus on four stages of implementation: preparation, monitoring, execution, and post-incident follow-up.

Preparation refers to the act of building and adopting the tools and procedures that will allow an AI developer to act swiftly and effectively in response to an incident. It includes identifying and understanding possible threats, establishing triggers for deployment corrections, developing tools and procedures for incident response, and establishing decision-making authorities. Externally, it includes sharing insights on best practices with regulators and industry partners and defining fallback options for downstream users in the case of service interruption.

Monitoring refers to the process of continuously gathering data on a model’s capabilities, behavior, and use (via a diverse range of sources), analyzing this data for anomalies, and escalating cases of concern to relevant decision-makers. AI developers should also feed relevant data back into the threat modeling process.

Execution refers to the decision to apply a deployment correction to a model and the procedures that follow this decision. This stage also includes alerting and coordinating with relevant regulatory authorities, implementing fallback systems for downstream users, and notifying customers of the situation.

Post-incident follow-up refers to the set of actions relevant to recovery, restoration, learning, and ongoing risk management in the wake of an incident. This stage involves the process of repairing a model and restoring service, after-action reviews, and feeding lessons back into the previous stages. In some cases, this stage may require significant involvement from external parties (such as when the incident is particularly severe and likely to occur in models developed by other companies). 

Between the lines

While some recently published standards and guidance have called out the need for AI developers to monitor deployed models for risks–and be prepared to withdraw them when necessary–there is more work to be done. Policymakers and AI companies will need to coordinate on several capacity-building measures, including (but not limited to):

  • Defining and sharing threat models and developing tools to parse data for signs of misuse or undesired model behavior.
  • Developing a standardized framework for frontier AI incident response and sharing best practices.
  • Establishing secure reporting lines for quickly communicating across industry and government in the case of an incident or discovered vulnerability.

Policymakers could also consider requiring frontier AI developers to take certain critical steps, such as maintaining control over model access or maintaining incident response plans and making such plans available to relevant agencies.

Finally, it is worth noting that the deployment corrections framework is not a silver bullet for managing AI risks. It is one small part of a larger conversation to build stronger governance mechanisms around frontier AI model development and deployment. This conversation has recently seen major advancements in the form of a US Executive Order and a flurry of publications of AI firms’ safety policies. While we look forward to seeing work that expands on our framework, we also look forward to work that fills important gaps in the broader project governing frontier AI development and deployment.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

AI Policy Corner: The Kenya National AI Strategy

AI Policy Corner: New York City Local Law 144

Canada’s Minister of AI and Digital Innovation is a Historic First. Here’s What We Recommend.

Am I Literate? Redefining Literacy in the Age of Artificial Intelligence

AI Policy Corner: The Texas Responsible AI Governance Act

related posts

  • The Sociology of Race and Digital Society

    The Sociology of Race and Digital Society

  • Humans, AI, and Context: Understanding End-Users’ Trust in a Real-World Computer Vision Application

    Humans, AI, and Context: Understanding End-Users’ Trust in a Real-World Computer Vision Application

  • AI Consent Futures: A Case Study on Voice Data Collection with Clinicians

    AI Consent Futures: A Case Study on Voice Data Collection with Clinicians

  • Knowing Your Annotator: Rapidly Testing the Reliability of Affect Annotation

    Knowing Your Annotator: Rapidly Testing the Reliability of Affect Annotation

  • The Narrow Depth and Breadth of Corporate Responsible AI Research

    The Narrow Depth and Breadth of Corporate Responsible AI Research

  • Government AI Readiness 2021 Index

    Government AI Readiness 2021 Index

  • Embedded ethics: a proposal for integrating ethics into the development of medical AI

    Embedded ethics: a proposal for integrating ethics into the development of medical AI

  • Assessing the nature of large language models: A caution against anthropocentrism

    Assessing the nature of large language models: A caution against anthropocentrism

  • Research summary: Mass Incarceration and the Future of AI

    Research summary: Mass Incarceration and the Future of AI

  • Why We Need to Audit Government AI

    Why We Need to Audit Government AI

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.