AI Policy Corner: Layered Governance in AI Labs: Defining Boundaries Across the Policy Stack

A person sits in an armchair and writes in a notebook, with speech bubbles showing indefinite strokes and then a lightbulb. Nearby, a table with a laptop showing an LLM chatbot interface and a cup.

✍️By Tejasvi Nallagundla.

Tejasvi is an Undergraduate Student in Computer Science, Artificial Intelligence and Global Studies and an Undergraduate Affiliate at the Governance and Responsible AI Lab (GRAIL), Purdue University.

📌 Editor’s Note: This article is part of our AI Policy Corner series, a collaboration between the Montreal AI Ethics Institute (MAIEI) and the Governance and Responsible AI Lab (GRAIL) at Purdue University. The series provides concise insights into critical AI policy developments from the local to international levels, helping our readers stay informed about the evolving landscape of AI governance. This piece uses Anthropic’s Claude Constitution, Responsible Scaling Policy, and Claude Sonnet 4.6 System Card as a representative example to look at how layered corporate AI policy documents come together to define and shape the boundaries of model behavior across a governance stack.

AI Governance as a Stack

We often think of AI governance as a set of rules or clear guidelines on the “yes” versus “no” of what a certain implementation or tool can or cannot do. In practice, however, the companies building these tools have to “govern” AI not just in terms of compliance with rules set out by governing bodies, but also in a corporate sense. This corporate governance doesn’t just come in the form of a single “AI policy document,” but rather as a range of materials, from high-level principles to in-depth technical assessments. We can think about these materials using the idea of a stack: a layered set of documents that each serve a different purpose within the broader governance goals of a corporate AI company.

One way for us to unpack this stack and see it in practice is to analyze a company’s approach with a specific question in mind across several documents. Here, I am choosing to do so through the AI lab Anthropic, and the question of how boundaries for model behavior are defined. Looking across three layers of Anthropic’s governance stack, we see that the idea of the boundary is not defined in just one place but rather constructed across all of them, with each layer serving a different purpose in shaping and presenting that boundary, albeit with the same overarching goal.

The Value Layer

Starting at the top of the stack is Claude’s Constitution, which Anthropic defines as “a detailed description of Anthropic’s intentions for Claude’s values and behavior.” From the perspective of our policy stack, we can think of this as the value layer. In terms of how the boundaries of model behavior are presented in this document, the company treats them as not fixed. They explain their approach of favoring the cultivation of “good values and judgment over strict rules and decision procedures”, with Claude playing a role in determining its behavior through holistic prioritization, balancing considerations such as helpfulness with guidelines and safety. Thus, at the value layer, the boundary definition is normative, describing what the model ought to do, rather than it being fixed or operationalized.

The Risk Layer

Moving down to the next layer of the stack is Anthropic’s Responsible Scaling Policy (RSP), which they define as a framework establishing how they “identify and evaluate risks” as well as make “decisions about AI development and deployment.” We can think of this as the risk layer. Going back to our question on the boundaries of model behavior, in the RSP, these are presented in a more structured way, tying them to factors such as capability thresholds, risk analyses, and internal governance decisions, while still highlighting that there is “flexibility in how risk thresholds are evaluated”. Thus, at the risk level, the idea of the boundary is shaped and presented with a focus on evaluation, rather than as a completely fixed line (similar to how it wasn’t in the Constitution).

The Evaluation Layer

Now we turn to Anthropic’s System Cards, which we can think of as the evaluation layer of the governance stack. Looking at their recent System Card for Claude Sonnet 4.6, they describe the document as outlining the model’s “characteristics, capabilities, and safety profile that [they] carried out before its public deployment.” Boundaries, at this layer, are presented through testing against thresholds and associated evaluation results. One interesting thing of note here is their observation that “confidently ruling out these thresholds is becoming increasingly difficult” with evolving model capabilities, thus highlighting how, at this layer too, the boundary is not fixed but rather shaped through testing and a sense of uncertainty.

Beyond the evaluation layer, a whole range of other materials add onto Anthropic’s governance stack, from their Transparency Hub to government consultations, usage policies, and research publications. Taken together, the focus, content, and framing of these materials on different topics, including boundaries of model behavior, shape how corporate governance surrounding these issues is constructed and distributed across the system.

Further Reading

Image credit: Distant Writing by Fabrizio Matarese / Better Images of AI / CC BY 4.0

AI Policy Corner: Layered Governance in AI Labs: Defining Boundaries Across the Policy Stack

Who Is Governing AI Matters Just as Much as How It's Designed

AI Policy Corner: Are U.S. AI Policies Strengthening Security or Weakening Global Influence?

AI Policy Corner: The Kenya National AI Strategy

Risks vs. Harms: Unraveling the AI Terminology Confusion

Tech Futures: At the Frontier of Fear, Uncertainty and Doubt

AI Policy Corner: Singapore's National AI Strategy 2.0

AI Policy Corner: AI Governance in East Asia: Comparing the AI Acts of South Korea and Japan

Social Context of LLMs - the BigScience Approach, Part 3: Data Governance and Representation

Bridging the Gap: Addressing the Legislative Gap Surrounding Non-Consensual Deepfakes

Beyond Consultation: Building Inclusive AI Governance for Canada's Democratic Future

About Us