The Safety Theater

Abstract

On July 21, 2023, seven leading AI companies — Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI — made voluntary commitments to the White House on AI safety. The commitments were non-binding. They contained no enforcement mechanism. Compliance was self-reported. In September 2023, eight additional companies joined. In September 2024, California's SB 1047 — the most significant state-level attempt at binding AI safety regulation — was vetoed by Governor Newsom after opposition from Google, Meta, and OpenAI. This paper documents the Voluntary Commitment as a named condition: the structural pattern in which non-binding pledges substitute for binding regulation, and binding regulation is blocked when attempted, producing a governance system that performs the appearance of oversight without the substance of accountability.

The White House Commitments

On July 21, 2023, President Biden convened seven leading AI companies at the White House. The event was announced with the framing of securing "voluntary commitments" to manage AI risks. The seven initial signatories were Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI. In September 2023, a second round expanded the commitments to include Adobe, Cohere, IBM, Nvidia, Palantir, Salesforce, Scale AI, and Stability AI.

The commitments fell into three categories. Safety: the companies pledged to conduct internal and external security testing of AI systems before release, to share information on managing AI risks across the industry and with governments, and to invest in cybersecurity safeguards. Transparency: they committed to developing watermarking mechanisms to identify AI-generated content and to publicly reporting their AI systems' capabilities and limitations. Societal benefit: they pledged to prioritize research on societal risks including bias and discrimination.

The commitments were covered extensively. The White House presented them as a significant governance milestone. The companies presented them as evidence of their commitment to responsible AI development. What was less widely noted was the structural character of the commitments: they were not legally binding, they contained no specific compliance metrics, they established no independent verification mechanism, and the consequences of non-compliance were undefined.

The Enforcement Gap

The Harvard Law Review noted that the voluntary commitments are "not backed by the force of law" and have "no accompanying enforcement mechanism." The lack of accountability metrics, the Review observed, "takes the pressure off companies to solve difficult technical challenges." The only potential enforcement lever is the FTC's general Section 5 authority over unfair and deceptive practices — which would apply only if a company made a public commitment it demonstrably failed to honor.

Commitment Versus Deployment

The question that the voluntary commitments framework raises is not whether the companies meant what they pledged. The question is what happened after they pledged it. The commitments were made in July 2023. The period from July 2023 through March 2026 saw the most aggressive deployment cycle in AI history.

OpenAI released GPT-4 Turbo (November 2023), GPT-4o (May 2024), and subsequent model generations. Google released Gemini 1.0 (December 2023), Gemini 1.5 (February 2024), and Gemini 2.0. Anthropic released Claude 3 (March 2024), Claude 3.5 (June 2024), and Claude 4 (2025). Meta released Llama 3 (April 2024) and Llama 4 (2025). Each release expanded the capability frontier. Each was a deployment decision that was not subject to external review, not conditioned on independent safety evaluation, and not accountable to any binding framework.

The companies were not violating their commitments. The commitments did not require them to submit to external safety review before deployment. They committed to conducting internal testing — and they may well have done so. The structural point is that the governance mechanism left the most consequential decisions — when to deploy, how to test, what safety thresholds to apply — in the hands of the companies making them, with no external check.

III

SB 1047: The Binding Alternative

In the absence of federal legislation, California State Senator Scott Wiener introduced SB 1047 — the Safe and Secure Innovation for Frontier Artificial Intelligence Models Act. The bill represented the most significant attempt to establish binding AI safety regulation in the United States. Its provisions targeted frontier AI models specifically: systems trained at a computing cost exceeding $100 million or fine-tuned at a cost exceeding $10 million.

SB 1047 would have required covered model developers to establish safety and security protocols before beginning training, to conduct pre-deployment testing, to maintain the ability to shut down deployed models, and to exercise reasonable care in preventing catastrophic outcomes — defined as mass casualties from weapons of mass destruction or damages exceeding $500 million from cyberattacks on critical infrastructure. The bill focused on the highest-stakes failure modes of the most powerful systems.

The bill attracted substantial industry opposition. Google, Meta, and OpenAI opposed it. Members of Congress wrote to Governor Newsom opposing the bill. The opposition arguments focused on the bill's threshold based on computational cost rather than deployment context, the potential burden on California's AI industry, and the risk that state-level regulation would fragment the regulatory landscape.

"Well-intentioned" but imposing "stringent" regulations that could burden California's leading artificial intelligence companies. — Governor Newsom's veto rationale

Governor Newsom vetoed SB 1047 on September 29, 2024. The veto was accompanied by a signing message that described the bill as well-intentioned but poorly targeted. The same week, Newsom signed 17 other AI-related bills — none of which imposed the kind of binding pre-deployment requirements that SB 1047 contained.

The SB 1047 episode is structurally instructive. It demonstrates the complete pattern: voluntary commitments occupy the governance space, binding legislation is proposed, industry opposition mobilizes, the binding alternative is blocked, and the voluntary framework remains as the primary governance mechanism. The companies that signed voluntary commitments at the White House opposed the binding legislation that would have made those commitments enforceable.

The Internal Dissent

One element of the SB 1047 episode deserves separate attention. At least 113 current and former employees of OpenAI, Google DeepMind, Anthropic, Meta, and xAI signed a letter to Governor Newsom supporting SB 1047. The people who build frontier AI systems — the engineers, researchers, and safety team members — supported the binding regulation that their employers opposed.

This divergence between company leadership and technical staff is a data point in the governance capture analysis. The individuals with the most direct knowledge of frontier model capabilities, failure modes, and safety challenges supported binding external oversight. The corporate entities that employ them opposed it. The divergence suggests that the expertise asymmetry operates in both directions: the companies possess the expertise to evaluate AI risks, and that expertise, when exercised by technical staff rather than corporate leadership, points toward binding regulation rather than voluntary commitments.

The employee letter did not change the outcome. Governor Newsom vetoed the bill. The structural point is that the governance system weighted corporate opposition more heavily than technical-staff support — even when the technical staff had direct knowledge of the safety concerns the bill addressed.

The Responsible AI Apparatus

Each major AI company maintains a "responsible AI" infrastructure — teams, publications, principles documents, and public communications dedicated to demonstrating safety commitment. These are not trivial investments. Anthropic's Responsible Scaling Policy, OpenAI's Preparedness Framework, Google's AI Principles — each represents genuine work by genuine safety researchers operating within the structural constraints of a for-profit entity.

The responsible AI apparatus serves multiple functions simultaneously. It provides a genuine venue for safety research and red-teaming. It provides a public-facing demonstration of safety commitment that shapes regulatory expectations. And it provides the institutional knowledge and vocabulary that company representatives bring to governance discussions, advisory bodies, and legislative hearings.

The structural tension is not between safety and profit. It is between internal safety work and external accountability. Internal safety teams operate within corporate authority structures. Their work is subject to deployment timelines, competitive pressures, and executive decisions about acceptable risk. External accountability would introduce a check on those internal decisions — a check that the voluntary commitment framework does not provide.

Internal Safety Work

Genuine technical effort by safety researchers. Red-teaming, alignment research, capability evaluation. Subject to internal authority, deployment timelines, and corporate priorities.

Public Safety Apparatus

Principles documents, responsible AI reports, safety pledges. Demonstrates commitment. Shapes regulatory expectations. Provides the vocabulary and framing for governance discussions.

External Accountability

Absent. No binding pre-deployment review. No independent testing requirement. No enforcement mechanism for voluntary commitments. The most consequential safety decisions remain internal.

The Safety Commitment Timeline

The relationship between safety commitments and deployment decisions can be traced through the documented record. Anthropic published its Responsible Scaling Policy in September 2023 — two months after the White House voluntary commitments. The RSP establishes capability thresholds (ASL levels) that trigger enhanced safety measures. It is the most technically specific internal safety framework published by any major AI company.

Anthropic has also been the company most willing to accept competitive disadvantage for safety reasons — it lost a Pentagon contract after refusing to remove safety guardrails from its models. This willingness distinguishes Anthropic from competitors and demonstrates that meaningful safety commitment within a corporate structure is possible. But the commitment is unilateral and internal. No external mechanism compels Anthropic to maintain its RSP, and no governance framework prevents competitors from deploying without equivalent safeguards.

The competitive dynamic is structural. A company that voluntarily constrains deployment based on internal safety standards operates at a competitive disadvantage relative to companies that do not. In the absence of binding regulation that applies equally to all competitors, the market incentive is to deploy faster and constrain less. Voluntary commitments do not change the incentive structure. They operate against it — which is why binding regulation, not voluntary commitment, is the governance mechanism that has proven effective in every comparable domain.

VII

The Voluntary Commitment — Named

Named Condition — GC-003

The Voluntary Commitment

The governance pattern in which non-binding industry pledges substitute for binding regulation, creating the appearance of oversight without the substance of accountability. The Voluntary Commitment operates through a four-stage cycle: (1) public concern about AI risks creates demand for governance, (2) industry offers voluntary commitments that satisfy the immediate political demand, (3) binding legislative alternatives are opposed by industry and blocked or vetoed, (4) the voluntary framework becomes the primary governance mechanism by default. The Voluntary Commitment is not a temporary measure pending legislation. It is a structural equilibrium in which the act of volunteering commitments reduces the political pressure for binding regulation, while the absence of binding regulation ensures the voluntary framework remains the governance mechanism. The companies being governed have designed the terms of their own governance and retain the unilateral ability to modify or abandon those terms.

The Voluntary Commitment pattern is not unique to AI. It is documented in environmental regulation (voluntary emissions pledges preceding the Clean Air Act), financial services (self-regulatory proposals preceding Dodd-Frank), and food safety (industry standards preceding the Food Safety Modernization Act). In each case, voluntary commitments delayed binding regulation. In each case, binding regulation eventually replaced voluntary commitments. The question for AI governance is whether binding regulation will arrive before the capability trajectory outpaces any governance mechanism — voluntary or binding.

VIII

What Safety Theater Produces

The term "safety theater" is not a dismissal of genuine safety work. It is a structural diagnosis. Security theater in aviation — the TSA's documented failures in red-team testing despite visible screening procedures — does not mean airports have no security. It means the visible apparatus performs a function distinct from the function it appears to perform. The performance of security is not the same as the provision of security.

In AI governance, the safety theater produces a specific structural outcome: the political demand for governance is satisfied by the visible apparatus of voluntary commitments, responsible AI teams, and safety pledges, while the most consequential decisions — what to deploy, when, with what safeguards, evaluated by whom — remain internal to the companies making them. The governance mechanism governs the appearance of safety, not the substance of deployment.

The next paper examines a specific instance of how the ambiguity between genuine safety practice and strategic positioning operates: the deployment of "open source" as simultaneously a safety mechanism, a competitive strategy, and a regulatory shield.

Cross-Saga Bridge · Saga VI

The Safety Theater documents the voluntary commitment architecture in AI governance. The Compliance Theater (CT series, Saga VI) documents why voluntary commitments fail as accountability mechanisms across all regulated industries. The pattern is the same: the artifact of compliance substitutes for the substance of accountability. GC-003 names the pattern in AI; CT names it structurally.