“Anthropic believes it needs to shift into triage mode with its safety plans, because methods to assess and mitigate risk are not keeping up with the pace of capabilities.”
— Chris Painter, METR Director (independent evaluator), summarizing Anthropic’s position, February 2026
The Admission
The sentence above was not written by a critic of AI development. It was written by the director of METR — Model Evaluation and Threat Research — an independent organization contracted by Anthropic to evaluate the safety of its systems. METR produced the technical assessment that accompanied Anthropic’s revised Responsible Scaling Policy. The director’s characterization of Anthropic’s own position appeared in that assessment, published February 25, 2026.
The word Anthropic chose was triage. Not “adaptation.” Not “refinement.” Triage: the medical protocol applied when resources are insufficient to treat all patients adequately. Triage is the methodology of the overwhelmed. It is what you do when the system cannot keep up — when decisions must be made about what to prioritize and what to defer because there is not enough capacity for everything.
The organization that has positioned itself as the institution most committed to AI safety development acknowledged, in a document published alongside a major revision of its safety policy, that its safety methodology is in triage relative to its capability development.
This paper documents what that means, how it happened, what it looks like in operation, and why it is the third and most technically specific mechanism by which human judgment is being systematically removed from consequential AI decisions. Papers I and II documented the structural legal gap and the rhetorical mechanism. This paper documents the methodological one.
What the Responsible Scaling Policy Was
In September 2023, Anthropic published its first Responsible Scaling Policy. The document was notable in one specific and significant respect: it contained categorical pre-commitments. Commitments that did not depend on competitive conditions, on Anthropic’s market position, or on what other AI developers were doing. The 2023 RSP stated, in substance, that Anthropic would not deploy AI systems above defined capability thresholds without demonstrated safety measures in place. If those measures did not exist, deployment would not occur. Full stop.
This was a hard tripwire, not a conditional policy. The distinction is architecturally important. A conditional policy says: we will pause deployment if conditions X, Y, and Z are met. A hard tripwire says: we will not deploy above this threshold, period, until the safety measures exist to govern it. The 2023 RSP was structured as a hard tripwire. It was Anthropic’s most specific and enforceable public commitment to the proposition that safety would not yield to competitive pressure.
The RSP created a framework based on AI Safety Levels (ASLs), ascending thresholds of capability that would trigger corresponding safety requirements. ASL-3 and above were defined as thresholds above which deployment required demonstrated containment measures and enhanced security protocols. The policy did not say these requirements would be relaxed if competitors were advancing. It said they would apply regardless.
The 2023 RSP was imperfect — every public safety commitment of this kind is imperfect. Its definitions of capability thresholds were contested and its measurement methodologies were uncertain. But its structure was clear: it was a pre-commitment, not a conditional. That structural property was what gave it meaning as a safety commitment. A conditional policy can always find conditions that justify deviation. A pre-commitment cannot.
What the Responsible Scaling Policy Became
On February 25, 2026, Anthropic published a revised RSP. The categorical pre-commitment was gone.
The new policy delays capability development only if two conditions are simultaneously met: Anthropic is leading the capability race, and catastrophic risks are assessed as significant. Both conditions are required. Both are subjective assessments. Both can be evaluated by Anthropic. Neither creates the binding tripwire of the 2023 version.
Jared Kaplan, Anthropic co-founder, offered the explanation publicly: “It didn’t really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments... if competitors are blazing ahead.”
This is a truthful statement about how the policy changed and why. It is also a precise description of how a categorical pre-commitment becomes a conditional one: by introducing the behavior of competitors as a variable in the commitment’s binding conditions. Once competitor behavior is in the equation, the commitment’s strength is determined by whoever is most willing to advance without constraint. The floor is set by the least constrained actor. The commitment has, in substance, been transferred from Anthropic’s own values to the collective behavior of the industry.
The 2026 RSP states this explicitly, in language that describes a collective action problem rather than a safety commitment: “If one AI developer paused... others moved forward... that could result in a world that is less safe. The developers with the weakest protections would set the pace.”
The observation is not wrong. The collective action problem it describes is real. But its inclusion in Anthropic’s revised safety policy transforms a pre-commitment into a rationalization for following the least-constrained actor downward. The policy that was designed to hold against competitive pressure now explicitly incorporates competitive pressure as its operative condition.
The Arithmetic of Oversight
Before artificial intelligence, IDF intelligence analysts in Gaza produced approximately 50 bombing targets per year. After deployment of the Gospel AI targeting system, production reached 100 bombing targets per day.
This is an efficiency increase of approximately 700-fold. Over a 300-day operational period, that is the difference between 15,000 targets and 50. The human oversight capacity that was designed for 50 targets per year was applied to 15,000. The result, documented by six intelligence officers with direct system access, was twenty seconds of review per target.
The twenty-second review was not a policy choice to reduce human oversight. It was an arithmetic consequence of deploying capability at a scale the oversight apparatus could not match. The capacity gap between AI-enabled targeting speed and human deliberation speed is not a flaw in the system’s design. It is a structural property of the speed asymmetry between AI and human cognition.
This arithmetic is not specific to military targeting. It describes the general condition when AI capability increases faster than the human oversight methodology designed to govern it. As capability scales, throughput scales. As throughput scales, human review per decision compresses. As review compresses, the human role in the decision loop nominally remains but substantively diminishes. At some throughput threshold, human review becomes a rubber stamp on decisions that have already been effectively made.
The triage condition Anthropic acknowledged is the institutional version of this arithmetic. Safety methodology development takes time: research must be done, evaluations must be designed, tests must be developed and validated, results must be interpreted. Capability development follows a different timeline, accelerated by competitive pressure and by the compounding nature of capability improvements. When the capability timeline is faster than the methodology timeline — when new capabilities exist before the tools to evaluate their risks have been built — the gap between what the system can do and what safety science can assess about what it can do is the triage condition.
The METR director’s summary was precise: methods to assess and mitigate risk are not keeping up with the pace of capabilities. This is not a temporary lag to be corrected at the next revision cycle. It is the structural condition of a field developing capability faster than it can develop the science to govern that capability.
Lavender as Proof of Concept: What Triage Looks Like at the Operator Level
The Lavender case is the operational instantiation of the triage condition at the level of individual targeting decisions. It is worth stating what it demonstrates in terms that connect to the RSP revision rather than treating the two as separate phenomena.
The IDF, like any military deploying AI-assisted targeting, has a body of doctrine, policy, and legal review that constitutes its “methodology” for governing AI use: rules of engagement, proportionality thresholds, targeting authorization chains. These elements were designed for a targeting throughput that the Gospel and Lavender systems rendered obsolete on the first day of their operational deployment. The doctrine was designed for 50 targets per year. The system produced 100 per day.
What happened was not that the doctrine was revised before deployment. What happened was that the doctrine was applied at scale it was not designed for and produced the triage condition: overwhelmed operators, compressed review, rubber-stamp authorization. The methodology could not keep up. The capability had already been deployed. The gap between them was filled by twenty seconds and a gender check.
The West Point Lieber Institute analysis captures the structural issue: “The speed and scale of production or ‘nomination’ of targets... may make human judgement impossible, or, de facto, absent.” The analysis frames this as a potential outcome. The Lavender testimony confirms it as an operational reality. Human judgment was not made impossible by technical design. It was made impossible by arithmetic. The capacity gap between AI-enabled targeting volume and human deliberation capacity produced the twenty-second review. The triage methodology had already been applied — operators were triaging the decision queue by compressing review time to a threshold that could accommodate the volume. The bomb was not ticking. The queue was.
The Training Data Problem as Methodology Gap
The Lavender system’s 10 percent error rate is traceable, in part, to a specific methodology failure that preceded deployment: the training data included civil defense workers in the category labeled “Hamas operatives.” The system was therefore trained to identify the common features of civil defense workers and Hamas operatives, not only the latter. The false positive pool was built into the model before any Palestinian was targeted by it.
This is not a software bug. It is a methodology failure at the design stage. The methodology for selecting, labeling, and auditing training data — which determines what the model learns to find — was insufficient. The people who designed the training data either did not recognize the labeling problem or did not have adequate methodology to detect it. The IHL violations that resulted from the 10 percent error rate are downstream consequences of a methodology failure that occurred before the model processed a single operational input.
The training data problem represents a distinct category of methodology gap from the throughput-driven triage at the operator level. The operator triage is caused by scale. The training data gap is caused by insufficient methodology at the design stage — the absence of rigorous data auditing and label verification tools adequate to catch the labeling error before deployment.
Both gaps share a structural feature: they are invisible to the human actor who would need to identify them in order to prevent their consequences. An operator reviewing a Lavender recommendation in twenty seconds cannot detect that the recommendation is a product of a training set that incorrectly labeled civil defense workers as militants. A commander who approved the 90 percent accuracy threshold cannot reconstruct, after the fact, which cases fell in the 10 percent. The methodology failures are upstream. Their consequences are downstream. The accountability chain that IHL requires runs from consequence back to cause. When the cause is invisible at the point of consequence and unrecoverable at the point of cause, the chain cannot be traced.
The Black Box and the Law: Explainability as the Missing Methodology
West Point’s Lieber Institute produced a specific analysis of what it called the “targeting black box” problem: the fundamental incompatibility between AI system opacity and the legal standards that govern targeting accountability. Their conclusion: “Black box AI models fundamentally compromise the integration of human operators by rendering such operators unaware of the context and reasoning influencing AI outputs.”
The legal scholar Ashley Deeks identified what she called the “double black box”: the combination of AI system opacity with national security classification creates conditions where Congress, courts, inspectors general, and legal counsel cannot perform their oversight functions. The AI’s reasoning cannot be examined. The operational context is classified. The combination produces an accountability structure that is formally intact and operationally empty.
The AOAV legal analysis is direct about the consequence for IHL accountability: “Without transparency into how Lavender concluded a strike was ‘legal,’ courts would face an insurmountable evidentiary hurdle in determining whether the decision violated the ‘reasonable commander’ standard.” The “reasonable commander” standard — the IHL test for whether a targeting decision was lawful — requires assessing what information was available to the commander and whether a prudent commander would have acted differently. When the information was processed by a system whose reasoning cannot be reconstructed, that assessment cannot be performed.
Explainability — the capacity of an AI system to provide a legible account of why it produced a particular output — is not currently a legal requirement under IHL or under the U.S.-led Political Declaration on Responsible Military Use of AI. The Political Declaration addresses transparency at the model level: documentation of training data, evaluation methodology, intended use. It does not require that the system’s reasoning in individual decisions be recoverable or legible. NATO’s revised AI strategy moved further from explainability as a foundational requirement after the generative AI era began, not closer.
This is a methodology gap of a specific kind: the absence of a required standard where a standard is necessary for the legal accountability framework to function. The law of armed conflict cannot hold commanders accountable for AI-mediated targeting decisions if it cannot require that the AI’s decision process be legible. The methodology — explainability as a legal requirement — does not exist. The legal framework that depends on it cannot function.
Venezuela, January 3, 2026: The First Confirmed Commercial AI Deployment in a Classified Military Operation
At approximately 2 a.m. Caracas time on January 3, 2026, United States Delta Force conducted a military operation in Venezuela involving more than 150 aircraft and strikes against multiple simultaneous targets. The operation produced casualties documented by multiple sources across a range of 75 to 100 or more: Venezuela’s defense minister reported 83 killed including 47 Venezuelan troops and 32 Cuban soldiers; Washington Post sources citing U.S. officials reported approximately 75; Venezuela’s interior minister reported over 100; The New York Times initially reported 40 and revised the figure to approximately 80.
Claude, Anthropic’s AI system, was deployed in the operation through a partnership with Palantir Technologies. This was confirmed by two sources to Axios in February 2026 as “the first time a commercially built AI model has been deployed inside a classified American military operation.”
What Claude did in the operation is not publicly known. Anthropic asked Palantir how Claude had been used. According to reporting, that question became a source of rupture between the companies. Anthropic found no violations of its usage policies. The absence of a finding of violation does not mean the use was reviewed and approved before deployment — it means that after the deployment occurred, no specific policy was identified as having been breached.
The methodology gap here is of a third type, distinct from the throughput arithmetic and the training data problem: it is the gap between contractual policy language and operational deployment conditions. Anthropic’s usage policies, designed through deliberative processes in peacetime commercial settings, were applied to a classified military operation in which Anthropic had no pre-deployment review, no real-time visibility, and no post-hoc access to operational details sufficient to determine what its system was used for. The methodology that was designed to govern use — terms of service, usage policies, review processes — could not be applied to the conditions of the deployment. The capability was there. The governance methodology was not.
The Palantir relationship is the mechanism by which the capability crossed from the commercial domain, where Anthropic’s governance methodology applies, into the classified military domain, where it structurally cannot. Palantir serves as the integrator that makes the capability available in classified environments. The governance gap is not incidental to that architecture. It is its defining feature. A commercial AI company cannot review classified military operations. A classified military operation cannot operate under commercial AI governance terms. The gap between the two is not a policy failure. It is a structural property of the arrangement.
February 9, 2026: The Resignation
On February 9, 2026, Mrinank Sharma, a senior AI safety researcher at Anthropic, resigned publicly. His statement has been widely quoted and deserves to be reproduced in its operative passages rather than summarized, because the precision of his language is the evidence:
“The world is in peril… I’ve repeatedly seen how hard it is to truly let our values govern our actions. I’ve seen this within myself, within the organization, where we constantly face pressures to set aside what matters most.”
Sharma was not a junior employee with limited visibility into the organization’s safety work. He was a senior researcher whose function was the safety methodology this paper examines. His departure was not framed as a career transition. It was framed as a response to conditions he had witnessed and could no longer reconcile with his values.
The specific language — “pressures to set aside what matters most” — is the language of organizational triage applied to values. Not “we made the wrong decision once.” Not “a specific policy was adopted that I disagreed with.” The characterization is structural and recurring: pressure applied repeatedly, against values that were affirmed organizationally but difficult to hold in practice.
The resignation is evidence of a kind that differs from documentary and statistical evidence. It is the testimony of a person with direct organizational knowledge, specific enough to motivate a public departure, framed as a response to a pattern rather than an incident. It does not describe specific decisions or reveal classified information. What it describes is the organizational experience of the triage condition: values under recurring pressure, in an organization whose institutional purpose is to hold those values against exactly that pressure.
The Anthropic RSP revision followed sixteen days later.
The Race to the Bottom: A Documented Sequence
The revised RSP’s collective action framing — that competitors advancing without constraints makes unilateral restraint produce a less safe world — is not merely a rationalization. It describes a real dynamic. The dynamic is documentable as a sequence of specific decisions by specific organizations over a specific timeline.
Early 2024: OpenAI removed the explicit ban on military and warfare use from its usage policies. The policy had previously listed “weapons, military applications” among prohibited uses. After the revision, it did not.
February 2025: Google reversed its internal prohibition on AI for weapons and surveillance. The prohibition dated to 2018, when thousands of Google employees protested the company’s involvement in Project Maven — a Pentagon program using AI for drone footage analysis. The protest produced a withdrawal from Maven. The 2025 reversal produced a re-entry into the domain the protest had caused Google to exit.
February 23, 2026: The Department of Defense signed an agreement with xAI, Elon Musk’s AI company, to provide access to the Grok AI model on classified systems. The agreement specified use for “all lawful purposes” with no company-specific conditions on how the system would be used. No safety constraints beyond what the law requires. No independent review. No capability thresholds. No organizational pre-commitment of any kind.
February 25, 2026: Anthropic revised its RSP, removing the categorical pre-commitment and introducing the competitive condition that makes its safety policy responsive to industry behavior.
The sequence shows what the collective action problem looks like as it operates in real time. It is not a theoretical concern about future dynamics. It is a documented progression: each actor, responding to the behavior of other actors, reduces its constraints. The floor is set iteratively. Each reduction provides the justification for the next. The xAI agreement — no conditions, all lawful purposes — becomes the floor against which every other agreement is measured. Meeting that floor is positioned not as a reduction in safety commitment but as competitive necessity.
The Pentagon made this dynamic explicit. Pentagon spokesperson: “Our nation requires that our partners be willing to help our warfighters win in any fight.” Hegseth’s January 2026 AI strategy document required that all military AI contracts eliminate company-specific guardrails within 180 days. The institutional demand is for unconditional capability. The collective action dynamic ensures that the most compliant actor defines what every actor must become to remain competitive.
The Triage Threshold: Named
The point at which AI capability development outpaces the safety methodology designed to govern it, producing conditions where governance decisions must be made without adequate assessment of what is being governed. The triage threshold manifests at three levels simultaneously: at the operator level, as compressed decision review when throughput exceeds human deliberation capacity; at the organizational level, as safety commitments revised under competitive pressure before the methodology to evaluate new capabilities has been developed; and at the systemic level, as a race-to-the-bottom dynamic in which each actor’s reduction of constraints justifies every other actor’s reduction. The triage threshold is the condition of governing at speed — where the alternative to making inadequate decisions is making no decisions, and inadequate decisions are therefore made.
The 2025 Anthropic RSP bio-terrorism activation is instructive as an example of the triage threshold before the RSP revision. The policy had established ASL-3 safeguards as a threshold to be activated when AI systems could facilitate bio-terrorism in ways that previously required advanced expertise. At some point in 2025, the bio-terrorism safeguards were activated — meaning the system had reached the capability level that the policy defined as requiring enhanced protection.
The activation occurred not because the system had demonstrably facilitated bio-terrorism, but because it could not be ruled out. This is the triage condition at the threshold decision point: the methodology for determining whether a system has crossed a capability threshold was not precise enough to give a confident positive finding. It could only give a confident negative: we cannot confirm it has NOT crossed this threshold. The safeguards were activated on the basis of what could not be ruled out rather than what had been demonstrated. That is the definition of operating without adequate assessment methodology. You act on the uncertainty because you cannot resolve it.
The RSP revision then removed the categorical commitment that those activated safeguards represent. The triage condition — inadequate methodology relative to capability — produced both the uncertain activation and the subsequent policy revision that made the activation’s binding commitment conditional on competitor behavior.
The Strongest Counterarguments
The framing of “triage mode” as evidence of institutional failure may misread what triage actually represents. Medical triage is not the abandonment of medicine. It is the application of medical judgment under conditions of resource scarcity. An organization in triage mode is not one that has stopped caring about outcomes. It is one that is making rational decisions about where limited resources for oversight and evaluation will have the most impact. The METR director’s characterization of Anthropic as being in triage mode may describe a rational response to an overwhelming environment rather than evidence of governance failure.
This is the strongest version of the counterargument, and it deserves acknowledgment: triage is a legitimate methodology, not an absence of methodology. The paper’s argument is not that triage is irrational or that organizations facing overwhelming conditions should pretend they are not. The argument is that triage by definition produces decisions made under inadequate assessment. Decisions made under inadequate assessment, in domains where errors are irreversible and the affected parties cannot consent or appeal, are governance failures by definition — regardless of whether they are rational responses to the conditions. The rationality of the decision under constraint does not eliminate the harm produced by the inadequate assessment. It explains it.
The collective action framing in the revised RSP is not merely a rationalization. If Anthropic pauses deployment unilaterally while xAI deploys Grok on classified systems without conditions, the net result may be that the AI systems operating in classified military environments are xAI’s systems without safety commitments rather than Anthropic’s systems with some. A world where the less safety-conscious actors dominate military AI deployment may be worse than a world where Anthropic remains in the market with some constraints, even if those constraints have been reduced. The RSP revision may be a pragmatic choice to remain relevant rather than an idealistic abandonment that makes no difference.
This argument has genuine force. The choice between “Anthropic in the market with reduced constraints” and “xAI in the market with no constraints” may favor the former. The paper’s response is not that this reasoning is wrong but that it is the description of a race to the bottom: each actor argues that remaining in the market with diminished constraints is better than exiting and leaving the field to less constrained actors. Applied by every actor, this reasoning produces convergence toward the least-constrained position. The logical endpoint of the argument is the xAI agreement: no conditions, all lawful purposes. Whether the journey produces better outcomes at each step than the alternative is a genuine empirical question. Whether the journey’s endpoint is better than what might have been achieved through a different trajectory is a question this paper cannot answer. It can document that the journey is occurring and the direction it is moving.
The characterization of AI safety methodology as in permanent triage misses the genuine progress being made in evaluation science. METR represents a serious independent evaluation effort. Red-teaming methodologies have matured substantially since 2023. Interpretability research, alignment research, and behavioral evaluation frameworks are all active areas of genuine scientific development. The gap between capability and methodology is real, but characterizing it as a fixed structural condition rather than a temporary developmental lag may overstate the case.
The paper acknowledges this progress and does not argue that safety methodology is standing still. The argument is narrower and more specific: the gap between capability development speed and methodology development speed is, as of February 2026 by Anthropic’s own account through its independent evaluator, such that safety plans are in triage. Progress in methodology development does not eliminate the triage condition if capability development is progressing faster. The question is not whether safety methodology is improving — it is. The question is whether the rate of improvement matches the rate of capability development. Anthropic’s own framing answers that question: it does not. The word they chose is triage.
When Triage Becomes the Argument
The triage threshold — the condition named in Section XI — was documented as a failure of methodology: safety science overwhelmed by capability development, forced to make decisions about what to prioritize and what to defer. That framing treats triage as a symptom. What the events of February 25 through March 2, 2026 revealed is that triage also functions as an argument. The insufficiency of the methodology did not only allow the Handoff to proceed. It was used to justify it.
The sequence: On February 25, 2026, Anthropic revised its Responsible Scaling Policy. The revision replaced categorical pre-commitment thresholds with dual conditions and introduced collective-action framing — the acknowledgment that unilateral safety commitments don't make sense when competitors will simply not make them. The independent evaluator, METR, characterized Anthropic's internal safety planning as being in triage mode relative to capability development. The RSP revision was published two days before Anthropic's public refusal of the Pentagon's demands and four days before the blacklisting.
The Pentagon's argument — made explicitly by CTO Emil Michael and implicitly in the blacklisting order — was that Anthropic's safety framework was the obstacle. Not an obstacle to a specific deployment. An obstacle to the United States' competitive position against adversaries who would face no equivalent constraint. The Boeing analogy was deployed: Boeing doesn't put combat restrictions in its engines. Why should Anthropic put ethics frameworks in its AI? The framing was: Anthropic's methodology is the problem, and responsible military integration requires removing it.
What the triage threshold documentation reveals is that this argument was partially correct about the wrong thing. The methodology was failing — not because it was too restrictive but because it was too slow. The RSP revision acknowledged, in bureaucratic language, that Anthropic could not keep pace with capability development well enough to maintain the categorical pre-commitments that had defined the original policy. The Pentagon read the same signal differently: the methodology that kept the safety commitments operational was weakening. The moment to press was now.
This is the triage threshold's second-order effect, and it is not visible in the first-order analysis of methodology failure. When the safety methodology's limitations become public — when an independent evaluator characterizes the organization as in triage, when a co-founder states publicly that unilateral safety commitments "didn't really feel like they made sense" — that characterization becomes available as an argument for the actors seeking to remove the constraints entirely. The methodology's acknowledged inadequacy provides rhetorical cover for demands that it be abandoned rather than strengthened.
The structure is: the methodology is overwhelmed → the methodology acknowledges it is overwhelmed → the acknowledgment is used to argue the methodology should not constrain military use → the military integration proceeds through a different company that accepted the argument. Each step is documentable. Each step follows from the previous one. The triage threshold is not only a governance failure. It is a governance failure that creates the argumentative conditions for its own exploitation.
OpenAI's response to the same dynamic is instructive. OpenAI did not have a public triage-mode acknowledgment, but it had made the equivalent move earlier: the removal of explicit military/warfare prohibitions from its usage policy in early 2024, prior to any of the public pressure that produced Anthropic's situation. By the time the Pentagon needed a compliant substitute on February 27, 2026, OpenAI had already completed the methodological capitulation that Anthropic was being pressured to make. The Pentagon deal was not a new decision for OpenAI. It was the formal expression of a methodological posture they had adopted eighteen months earlier.
Sam Altman's own words confirmed this. In his X "Ask Me Anything" session on March 1st, he said the deal "was definitely rushed, and the optics don't look great." The "optics" framing is revealing: it locates the problem in appearance rather than substance. The concern was not that the deal was wrong but that it looked bad relative to Anthropic's principled stand. The methodological question — whether the safety frameworks removed or weakened to make the deal possible were adequate to the capabilities being deployed — was not the concern. The concern was how the sequence looked.
What the February 25 to March 2 sequence demonstrated is that the triage threshold has an institutional capture dimension that the original named condition doesn't fully describe. It is not only that the methodology cannot keep pace. It is that when the methodology publicly acknowledges it cannot keep pace, that acknowledgment is immediately available to actors who want no methodology at all. The gap between "we are in triage" and "therefore we should remove the methodology entirely" is smaller than it appears. The actors who want the methodology removed are watching the triage acknowledgment and drawing exactly that inference.
The public verdict — the App Store numbers, the migration to Claude, the chalk on the sidewalk — represents a counter-signal to that inference. The population that moved to Claude in the days following the standoff was not operating with knowledge of the triage threshold or the RSP revision or the history of the METR evaluation. They were responding to a simpler and more fundamental signal: one company said no at cost, and another said yes immediately. Whatever the methodology's internal limitations, its public expression — the willingness to lose a $200 million contract rather than remove it — was legible and valued. The methodology's failure to keep internal pace with capability development did not undermine its function as a public commitment. The two things, it turns out, are different. And in the absence of binding legal instruments, the public commitment may be more important than the internal adequacy.
This does not resolve the triage threshold. The methodology's internal failures are real regardless of the public verdict. But it adds a dimension to what the methodology's failure means: it is not only a governance problem. It is a signal problem. The triage acknowledgment signaled weakness. The public refusal signaled integrity. Both signals were accurate descriptions of different aspects of the same institution. Which signal reaches which actor determines, in the absence of legal frameworks, what constraints actually operate.
Conclusion: The Methodology Described Its Own Failure
The most significant evidentiary element in this paper is also the most unusual: the institution responsible for maintaining the methodology acknowledged publicly, through its independent evaluator, that the methodology is in triage. This is not a finding produced by external critics. It is a self-description, chosen deliberately, by an organization that has positioned safety as its defining institutional commitment.
Triage mode. The methodology designed to keep pace with capability development, to govern deployment decisions, to prevent the safety commitments from becoming conditional on competitive dynamics — that methodology is overwhelmed. It is making decisions about what to prioritize and what to defer. The twenty-second review of Lavender targeting recommendations is triage at the operator level. The RSP revision is triage at the organizational level. The race-to-bottom sequence from OpenAI to Google to xAI to the revised Anthropic RSP is triage at the systemic level. Each level is making the decisions available to it given the resources and time constraints present — and none of those decisions is adequate to the consequences they govern.
And the triage acknowledgment became the argument. The methodology's public admission of overwhelm was immediately available to actors who wanted no methodology at all. The events of February 25 through March 2 showed that distance crossed in real time, in public, with a substitute actor ready to fill the gap the moment the principled actor refused.
Three conditions have been named across the three papers of this series. The Accountability Vacuum: the legal framework cannot assign responsibility for autonomous lethal decisions. Hypothetical Capture: extreme scenarios are used to normalize the removal of constraints before the consequences of removal can be examined. The Triage Threshold: safety methodology cannot keep pace with capability development, producing governance decisions made under inadequate assessment — and producing, as a second-order effect, the argumentative cover for the constraints' removal entirely.
Paper IV asks whether these three conditions, operating simultaneously in February 2026, constitute a moment of transition rather than a moment of crisis to be managed. Whether the transfer of lethal decision-making authority that Papers I, II, and III document separately is, viewed together, a single ongoing event. Whether the handoff is already happening.
Sources
- Chris Painter, METR Director. Technical appendix accompanying Anthropic Responsible Scaling Policy revision, February 25, 2026. Characterization of “triage mode” as Anthropic’s description of its safety plans relative to capability development.
- Anthropic. Responsible Scaling Policy. Original version, September 2023. Categorical pre-commitment language on deployment thresholds.
- Anthropic. Responsible Scaling Policy, Revised. February 25, 2026. Dual-condition replacement of categorical pre-commitment; collective action framing.
- Jared Kaplan, Anthropic co-founder. Public statements on RSP revision rationale, February 2026. “It didn’t really feel… that it made sense to make unilateral commitments.”
- Yuval Abraham and Ori Hagoel. “Lavender: The AI Machine Directing Israel’s Bombing Spree in Gaza.” +972 Magazine / Local Call, April 3, 2024. Primary source for targeting throughput and operator review conditions.
- Lieber Institute, West Point. “Targeting in the Black Box: The Need to Reprioritize AI Explainability.” Articles of War, 2024. Source for explainability as absent legal requirement and “double black box” analysis.
- Ashley Deeks. The National Security Double Black Box. Referenced and analyzed in Lawfare, 2025.
- Arms of Concern Law Association (AOAV). “The Lavender Precedent.” November 2025. “Insurmountable evidentiary hurdle” finding re: reasonable commander standard.
- Axios. Reporting on Claude’s deployment in Venezuela operation through Palantir partnership. “First time a commercially built AI model has been deployed inside a classified American military operation.” February 2026.
- Wall Street Journal. Reporting on Pentagon ultimatum to Anthropic, Venezuela operation, and February 2026 pressure campaign. February 14, 2026.
- Mrinank Sharma. Public resignation statement, February 9, 2026. “The world is in peril… pressures to set aside what matters most.”
- OpenAI. Usage policy revisions, early 2024. Removal of explicit military/warfare prohibition.
- Google. Internal AI ethics policy revision, February 2025. Reversal of 2018 Project Maven withdrawal; re-entry into weapons and surveillance AI domain.
- U.S. Department of Defense / xAI. Agreement for Grok deployment on classified systems. February 23, 2026. “All lawful purposes” framing; no company-specific conditions.
- Hegseth AI Strategy Document. January 2026. Requiring elimination of company-specific AI guardrails from military contracts within 180 days.
- Pentagon CTO Emil Michael. Statements on Anthropic relationship. “Anthropic should have no say in how the Pentagon uses its products.” February 2026.
- UN OHCHR. Verified casualty data, Gaza conflict, 2024. Classified IDF database leaked May 2025: 17% of 53,000 killed were combatants; 83% were civilians.
- Human Rights Watch. Questions and Answers: Israeli Military’s Use of Digital Tools in Gaza. September 2024.
- Foreign Policy. “Israel’s Algorithmic Killing of Palestinians Sets Dangerous Precedent.” May 2024. “Displaces humans by default” analysis.
- Dario Amodei. “Machines of Loving Grace.” October 2024. Stated concern: concentration of lethal decision-making in narrow hands; “too few fingers on the button.”
- Dario Amodei. “The Adolescence of Technology.” January 2025. Four technologies enabling autocracy including fully autonomous weapon swarms and AI-powered mass surveillance.