The Accountability Gap · Paper I

The Gap Is Not New

Autonomous Weapons and the Accountability Problem That Predates AI

The Institute for Cognitive Sovereignty · 2026 · Research Paper

CSI-2026-AG-001 Published February 28, 2026 45 min read Learn: Systems β†’
2013
Gap formally named by UN
20 sec
Human review per target
0
Binding legal frameworks
Jan 3
First confirmed commercial AI military deployment, 2026
“A human being somewhere has to take the decision to initiate lethal force and as a result internalize (or assume responsibility for) the cost of each life lost in hostilities. Delegating this process dehumanizes armed conflict even further and precludes a moment of deliberation in those cases where it may be feasible.”
β€” Christof Heyns, UN Special Rapporteur on Extrajudicial Executions, 2013
Section I

The Record

The accountability problem created by autonomous weapons systems was formally named and documented in 2013. It was named by the United Nations. It was named by international humanitarian law scholars. It was named by military ethicists. It was named with precision, with urgency, and with a clear understanding of what its resolution would require.

It has not been resolved.

This paper documents the gap as it was identified, the legal framework it ruptures, the operational cases that have since confirmed its severity, and the structural reasons it persists. The argument is not that autonomous weapons are new. The argument is that the accountability problem they create is documented, named, and unaddressed — and that the gap is no longer waiting to be filled. It has already been crossed.

A note on scope: this paper does not adjudicate whether any specific military action was lawful or justified. Those determinations require evidentiary processes that are outside this paper’s remit. This paper documents a structural condition — the absence of a legal framework capable of assigning accountability for autonomous lethal decisions — and the evidence confirming that condition is active and consequential.


Section II

The Taxonomy

The United States Department of Defense established a formal taxonomy of human involvement in autonomous weapons systems in November 2012. DoD Directive 3000.09, updated in 2023, defines three operational categories:

Human-in-the-loop: Semi-autonomous systems where a human operator makes the final decision to apply lethal force. The weapon may assist in target identification but cannot engage without explicit human authorization for each engagement.

Human-on-the-loop: Supervised autonomous systems where a human operator can monitor and intervene during operation but is not required to authorize each individual engagement. The system acts; the human can override.

Human-out-of-the-loop: Fully autonomous systems that select and engage targets based on programmed criteria without any human involvement at the point of engagement.

The taxonomy is valuable because it reveals the assumption embedded in the legal and ethical frameworks that govern armed conflict: that meaningful human judgment exists somewhere in the decision chain. The directive is principally grounded in the role of the human operator, not in any technical specification of the weapon system itself.

What the directive does not address is what happens when the human’s role in the loop becomes nominal rather than substantive — when a human is technically present but operationally absent. This gap between formal taxonomy and operational reality is where the accountability vacuum resides. It exists not only in the future category of fully autonomous systems but is already present in systems that retain a human signature while operationally eliminating human judgment.

The taxonomy, in other words, is a legal framework. It is not a description of operational reality. Operational reality has outrun the taxonomy. That distance — between the legal category and the facts on the ground — is the subject of this paper.


Section III

The Legal Framework That Cannot Govern What Already Exists

International humanitarian law — the body of law governing armed conflict — rests on three operative principles. Each of the three requires something autonomous systems cannot, by design, provide: real-time contextual human judgment.

Distinction

Parties to a conflict must at all times distinguish between civilian populations and combatants, and between civilian objects and military objectives. Attacks may be directed only against military objectives. This is not a procedural requirement. It is a substantive obligation that requires an actor capable of assessing context, interpreting ambiguous situations, and making a genuine determination of combatant status at the moment of engagement.

A system trained on historical data, however sophisticated, classifies. It does not assess. Classification at scale, at the speed of autonomous engagement, and under the uncertainty of live conflict necessarily produces errors that distinction forbids. The distinction principle assumes a human who can be held responsible for the determination. An algorithm cannot be held responsible. Its designers, trainers, and commanders can — but only if the chain of causation is legible. When an opaque system makes a determination that a human then approves in twenty seconds without understanding the system’s reasoning, that legibility collapses.

Proportionality

An attack must not cause incidental civilian casualties or damage to civilian objects that is excessive in relation to the concrete and direct military advantage anticipated. This principle requires genuine weighing — a consideration of who is present, what the military value of the objective is, and whether the foreseeable harm is proportionate to that value.

Pre-authorized proportionality thresholds — policies that fix acceptable civilian casualties per category of target in advance — are not proportionality assessments. They are proportionality abolitions. They replace per-case evaluation with statistical authorization. The principle exists precisely because each engagement situation is distinct. Statistical pre-authorization treats the principle as a bureaucratic checkbox.

Precaution

Parties must do everything feasible to verify that targets are legitimate military objectives, to minimize civilian harm, and to cancel or suspend attacks when it becomes apparent that the target is not a legitimate objective or that the attack would cause disproportionate civilian harm.

Precaution requires the capacity to cancel. An autonomous engagement at machine speed, once initiated, cannot be cancelled by a human who authorized it twenty seconds earlier without reviewing the targeting data. The cancellation window exists formally. It does not exist operationally.

These three principles share a common structural requirement: they presuppose a human actor making real-time, case-specific, contextually informed decisions who can then be held accountable for those decisions. Autonomous and semi-autonomous systems fragment that requirement at every point. They distribute judgment across programmers, trainers, commanders, and operators in ways that make criminal responsibility — which requires identifiable individual acts or omissions — structurally difficult to assign.


Section IV

The Martens Clause and the Principle of Humanity

Customary international humanitarian law contains a backstop that predates AI by over a century. The Martens Clause, originating in the 1899 Hague Convention and reaffirmed in the Geneva Conventions’ Additional Protocols, provides that in cases not covered by treaty law, civilians and combatants remain under the protection and authority of the principles of international law derived from established custom, principles of humanity, and the dictates of public conscience.

The clause was designed precisely for moments when technology outran treaty. Its invocation in the context of autonomous weapons is not rhetorical. It establishes that the absence of a specific legal instrument governing a category of weapon does not create a zone of permissibility. The principles of humanity apply regardless.

Christof Heyns, in his 2013 report to the UN Human Rights Council, invoked the Martens Clause directly in the context of lethal autonomous robots. His argument was precise: if a machine makes a lethal decision, the requirement that a human being internalize the cost of that decision — the requirement that is embedded in the principle of humanity — cannot be satisfied by nominal human presence in a process that has operationally eliminated human judgment. The Martens Clause may not be enforceable through a specific treaty mechanism. It is, however, the legal principle that establishes why the accountability vacuum is not merely an administrative gap. It is a violation of the foundational premise on which the law of armed conflict rests.

That premise is: someone is responsible. Armed conflict is not a natural event. It is a human event, conducted by human actors who bear human accountability for its conduct. The moment that premise becomes functionally false — when no human actor can be meaningfully held accountable for an autonomous lethal decision — the legal and moral architecture of international humanitarian law loses its operative foundation.


Section V

Libya, March 2020: Fire, Forget, and Find

The first documented case of a lethal autonomous weapon engaging human targets in combat occurred in March 2020, in the Libyan civil war between forces aligned with the UN-recognized Government of National Accord and those loyal to General Khalifa Haftar.

The UN Panel of Experts on Libya documented the engagement in their March 2021 report. The relevant passage: “Logistics convoys and retreating [Haftar-affiliated forces] were subsequently hunted down and remotely engaged by the unmanned combat aerial vehicles or the lethal autonomous weapons systems such as the STM Kargu-2 and other loitering munitions. The lethal autonomous weapons systems were programmed to attack targets without requiring data connectivity between the operator and the munition: in effect, a true ‘fire, forget and find’ capability.”

The Kargu-2 is a quadcopter loitering munition manufactured by Turkish defense company STM. It uses machine learning-based object classification to identify and engage targets, with the capacity to operate without any connection to a human operator after deployment. The manufacturer advertises it as an anti-personnel weapon with swarming capabilities in development.

The UN report did not confirm that autonomous kills occurred — only that the system was deployed in a configuration designed to engage without human oversight. Turkey disputes the characterization of autonomous operation. The manufacturer maintains that operators must verify targets before engagement. The system was, however, documented in a configuration where that verification was not required.

This ambiguity is itself the evidence. We cannot determine with certainty whether the Kargu-2 killed autonomously in Libya because the architecture of autonomous weapons deployment is, by design, not legible after the fact. A Kargu-2 used autonomously looks identical to one used manually. If we cannot determine whether autonomous engagement occurred, we cannot assign accountability for it. The inability to determine what happened is the accountability vacuum, operating in the field.

The Kargu-2 case was significant as a precedent. It was not the most consequential case of AI-assisted targeting that followed. What came next was different in scale, documentation, and legal implication by an order of magnitude.


Section VI

Gaza, 2023–2024: The Most Documented Case in the History of AI-Assisted Targeting

Between October 2023 and the publication of investigative reporting in April 2024, the Israel Defense Forces deployed three interconnected AI systems in Gaza that together constitute the most extensively documented case of AI-assisted targeting in the history of armed conflict. The documentation is remarkable not because outside investigators observed the systems but because six IDF intelligence officers with direct involvement gave detailed testimony, which was confirmed by multiple major news organizations.

The three systems, and their functions:

The Gospel identified buildings, infrastructure, and structural targets. Before AI-assisted targeting, human analysts produced approximately 50 bombing targets per year in Gaza. The Gospel enabled production of 100 bombing targets per day. That is an efficiency increase, at conservative estimate, of factor 700. The human oversight methodology was not scaled to match.

Lavender was an AI-powered database that assigned every Palestinian man in Gaza a score from 1 to 100 indicating estimated likelihood of militant affiliation. At peak, the system flagged 37,000 people as suspected targets. The IDF assessed the system’s accuracy at 90 percent — a finding that, applied to 37,000 targets, yields approximately 3,700 people marked for lethal targeting who were not militants, by the IDF’s own assessment. That 10 percent error rate was not treated as a disqualifying deficiency. It was treated as an acceptable statistical margin.

Where’s Daddy? tracked Lavender-identified individuals and alerted military operators when those individuals were at home. The system was specifically designed to locate targets in residential settings rather than in military activity settings, because home location data was more reliable. The practical consequence: strikes were timed to occur when targets were most likely to be surrounded by family members. According to sources, the army knew this and authorized it.

The authorized collateral damage ratios: for a junior Hamas operative identified by Lavender, IDF policy during the early weeks of the war authorized up to 15 to 20 civilian deaths. For a senior commander, the authorization extended to over 100 civilian deaths. In previous operations, the IDF had not authorized collateral damage for assassinations of low-ranking militants. The combination of AI-enabled volume and pre-authorized thresholds transformed targeted killing into statistical industrial production.

The training data problem is foundational to understanding why the error rate exists and why it was predictable. Lavender was trained on data that used the term “Hamas operative” loosely — including civil defense workers in the training dataset alongside confirmed militants. The system was therefore trained to find the common features of both groups, not only the latter. The false positive pool was structurally embedded in the model before any Palestinian was targeted by it. The accountability for the 10 percent error rate does not begin with the human who approved a strike in twenty seconds. It begins with the choice of training data. It was distributed across designers, commanders, and policy-makers before a single target was flagged.


Section VII

The Rubber Stamp Problem: Human Presence Without Human Judgment

The most significant finding in the Lavender documentation is not the error rate, the collateral damage authorization, or the targeting of people in their homes. It is the description of what human oversight looked like in practice.

“I would invest 20 seconds for each target at this stage, and do dozens of them every day. I had zero added value as a human, apart from being a stamp of approval.”

This was not one officer describing an aberration. It was corroborated across six intelligence officers with direct system access. The human review consisted, primarily, of confirming that the Lavender-identified target was male — on the assumption that women were not combatants. Twenty seconds. Gender verification. Authorization granted. Next target.

The army explicitly knew this oversight was insufficient to detect errors. According to testimony, a common error occurred when a Hamas operative gave his phone to a family member — a son, a brother, a random male — and the phone’s communication patterns then flagged the recipient. Lavender identified the phone, not the person. The 20-second review, checking gender, would not catch this error. The army knew. The protocol was, in the words of one source, that “even if you don’t know for sure that the machine is right, you know that statistically it’s fine.”

This is the rubber stamp problem reduced to its operational essence. The human is formally in the loop. The human cannot exercise the judgment the loop requires. The nominal presence of human authorization does not satisfy the legal requirements of distinction, proportionality, and precaution — it creates the appearance of satisfying them while eliminating their substance.

The legal consequence is precise: the appearance of human oversight, when it substitutes for the substance of human oversight, does not preserve accountability. It distributes it across so many actors — programmers, trainers, policy-makers, commanders, individual operators — that accountability becomes untraceable. When an ICC investigation attempts to determine whether a particular strike violated IHL, it must reconstruct what information was available, how the AI processed it, what recommendation the system generated, what the operator understood, and what the commander authorized. Without transparency into the AI’s reasoning — and the system is, by technical necessity, opaque — that reconstruction is not possible. Courts face, in the words of the Arms of Concern Law Association (AOAV), “an insurmountable evidentiary hurdle.”

The machine’s opacity is not an incidental technical limitation. It is a structural feature of the accountability vacuum. Opacity and autonomy together create conditions where each actor in the chain can truthfully point to someone else — and where the distributed pointing constitutes, in aggregate, the dissolution of accountability rather than its assignment.


Section VIII

The Accountability Vacuum: Named, Documented, Unresolved

Named Condition — Paper I
The Accountability Vacuum

The structural absence of a human actor who can be held legally responsible for an autonomous lethal decision. International humanitarian law assumes a human pulled the trigger. Autonomous and semi-autonomous systems break that assumption without replacing the legal framework built on it. The vacuum does not require the complete absence of human actors. It requires only the elimination of legible human causation — which can be achieved through opacity, distribution, speed, or the nominalization of a human role that has been operationally hollowed out.

The accountability vacuum was named in Christof Heyns’ 2013 report with specificity that has not been improved upon in the decade since. Heyns identified the core problem: IHL requires a human to bear the cost of each life taken. When a machine takes a life, the question of who bears that cost — the programmer, the commander, the manufacturer, the deploying state — has no settled legal answer.

His report identified the candidates for accountability. None is satisfactory:

The programmer cannot have anticipated every situation the system will encounter. Programming is not authorization of specific engagements. Criminal liability requires specific acts or omissions with identifiable consequences. A programmer who trained a system on imperfect data may be morally implicated in downstream casualties but is not, under current frameworks, criminally liable for them.

The commanding officer bears responsibility under command responsibility doctrine for acts of subordinates they knew or should have known about and failed to prevent. But command responsibility doctrine was developed for human subordinates whose acts are observable and whose errors are correctable. An autonomous system’s engagement decisions are not observable in advance, and the system’s reasoning process is not legible afterward. The “should have known” standard cannot be applied to an actor who could not have known what the system would encounter.

The manufacturer may bear civil product liability in some jurisdictions for defective systems. But civil liability is not criminal accountability, and it does not satisfy IHL’s requirement that specific violations be investigated and prosecuted.

The deploying state bears state responsibility under international law for IHL violations by its armed forces. But state responsibility does not resolve individual criminal accountability, and it functions as a geopolitical instrument rather than a mechanism that can investigate and adjudicate the specific decisions that produced specific casualties.

In practice, when an autonomous system makes an erroneous lethal targeting decision, the accountability chain produces the following outcome: every actor points to someone else. The programmer points to the commander who established the rules of engagement. The commander points to the training data the programmer selected. The operator points to the system’s recommendation and the policy authorization. The policy-maker points to the threat environment that required speed. The state denies wrongdoing pending investigation. The investigation cannot reconstruct the AI’s reasoning. The case closes without accountability assigned.

This is not speculation. The Lavender documentation shows it in operation. Classified IDF data leaked in May 2025 indicates that 83 percent of the 53,000 Palestinians killed in Gaza were civilians. No individual has been held criminally accountable for the targeting decisions that produced that ratio.


Section IX

152–4: Consensus Without Consequence

In December 2023, the United Nations General Assembly voted 152 to 4, with 11 abstentions, in support of advancing discussions toward international regulation of lethal autonomous weapons systems. The vote was among the most lopsided on a major security question in the Assembly’s history. It reflected genuine international consensus that autonomous weapons pose a distinctive accountability and humanitarian problem that existing frameworks do not resolve.

The United Nations Secretary-General called for a legally binding prohibition on systems that function without human control by 2026.

The UN Convention on Certain Conventional Weapons has hosted discussions on lethal autonomous weapons since 2014 — thirteen years and counting — without producing a legal instrument. The United States, Russia, and Israel have consistently resisted binding commitments. No such instrument exists as of 2026.

The gap between the vote (152 nations) and the outcome (zero binding law) describes the accountability vacuum in its geopolitical dimension. There is no shortage of recognized consensus that the problem exists and requires resolution. There is a shortage of the institutional mechanisms and political will to impose costs on the states that resist. The states that resist are, not coincidentally, the states most heavily invested in autonomous weapons capability.

The accountability vacuum is self-reinforcing in this way: the states that have the capability to close the gap are the states with the greatest operational incentive to keep it open.


Section X

The Strongest Counterarguments

Any responsible analysis of the accountability gap must engage the strongest versions of the positions that dispute or qualify it. Three arguments have genuine intellectual weight and deserve direct engagement.

Counterargument I
Existing IHL is sufficient if properly applied

The traditionalist position, represented by some West Point Lieber Institute analyses, holds that commander responsibility doctrine, properly applied, extends to AI-mediated decisions. Commanders who set error rate thresholds, establish rules of engagement, and authorize systems with known limitations are accountable for the foreseeable consequences of those choices. The accountability vacuum is not a gap in the law; it is a gap in enforcement.

This is a serious argument. The response is not that it is wrong but that it is insufficient. Commander responsibility doctrine was designed for human subordinates whose conduct is observable and interpretable. The opacity of AI systems creates a specific evidentiary problem that the doctrine does not address: when an AI’s reasoning cannot be reconstructed, the “reasonably well-informed commander” standard cannot be applied. Courts cannot determine what information was available to the commander because the system that processed that information cannot explain its own conclusions. Formal accountability may exist. Operational accountability may not. The gap between them is exactly what requires a legal response the existing framework has not provided.

Counterargument II
Autonomous systems may produce fewer civilian casualties than human operators

The humanitarian benefits argument holds that human combatants under stress, fear, and cognitive load make targeting errors at rates that autonomous systems — operating without those impairments — could potentially reduce. A drone that does not panic, that does not fire in response to provocation, that executes programmed targeting criteria without emotional distortion may be, on the humanitarian dimension, superior to a human soldier in some contexts.

The argument has empirical grounding in some controlled environments. Its limitation is that it proves too much. The humanitarian benefits of autonomous systems are contingent on the quality of the system’s training data, the accuracy of its classification, and the conditions of the operating environment. The Lavender case demonstrates that a 10 percent error rate, pre-authorized collateral damage thresholds, and training data that included civilians in the “Hamas operative” category produced, not humanitarian improvement, but industrialized civilian targeting. The humanitarian benefits argument is not wrong as a theoretical possibility. It does not describe the documented operational reality.

Counterargument III
The Lavender case does not establish the accountability gap; it establishes accountability for specific policy choices

The IDF and some IHL scholars argue that Lavender was a decision-support system, not an autonomous weapon — that human operators made final decisions, and that accountability for the system’s outputs therefore rests with the humans who established the error rate threshold, authorized the collateral damage ratios, and approved individual strikes. The accountability gap does not arise because humans are accountable for all of these choices.

This argument correctly identifies where formal accountability is located. It does not address the gap between formal accountability and operational accountability. An operator who authorized a strike in twenty seconds, checking only that the target was male, on the recommendation of an opaque system, cannot be meaningfully said to have made the targeting decision. A commander who set a 90 percent accuracy threshold for a system whose error cases cannot be identified or recovered cannot be meaningfully said to have authorized the deaths of the 10 percent. Formal accountability distributed across enough actors, mediated through enough opacity, becomes operationally indistinguishable from no accountability. The legal framework requires the former. It does not guarantee the latter. That gap is real regardless of the label attached to the system.


Section XI

What Closing the Gap Would Require

The accountability gap cannot be closed by human presence in the targeting loop. Lavender has demonstrated that human presence can be reduced to twenty seconds of gender verification while formally satisfying the “human in the loop” requirement. Closing the gap requires something more specific: meaningful human judgment at the point of lethal decision, which requires, in turn, that the human have the time, the information, and the comprehension of the AI system’s reasoning to exercise genuine judgment rather than perform nominal authorization.

What that would require, operationally:

Explainability as a legal requirement. If a commander cannot understand how an AI system reached a targeting recommendation, they cannot exercise the judgment that IHL requires. Explainability is not currently a legal requirement under IHL or under the U.S.-led Political Declaration on Responsible Military Use of AI. Both frameworks address the constitutive parts of the model — the transparency of training data and documentation — without requiring that the model’s reasoning in individual cases be legible. NATO’s revised AI strategy, updated after the generative AI era began, moved further away from explainability as a foundational principle, not closer. Explainability would need to become a legal condition of lawful AI-assisted targeting.

Accountability frameworks that match the speed and scale of AI-enabled targeting. Human investigators and legal processes operate at human speed. AI-enabled targeting operates at machine speed. The asymmetry means that accountability, even when formally assigned, can only ever be retrospective — applied after the fact to decisions made at a speed that no retrospective review can meaningfully evaluate case by case. A framework designed for AI-enabled targeting would need to address the scale problem: accountability processes capable of examining thousands of individual targeting decisions, not just the policy choices that authorized them.

A binding international instrument. Thirteen years of UN discussions without a legal instrument is not a process on the verge of producing one. A binding instrument would require the states most invested in autonomous weapons capability — the United States, Russia, Israel, China — to accept constraints on the use of systems they consider central to military advantage. The geopolitical conditions for that agreement do not currently exist.

None of these requirements is being met. All of them are preconditions for genuine closure of the accountability gap.


Section XII

January 3, 2026: The Gap Crosses Into Commercial AI

The cases documented in Sections V through VII β€” the Kargu-2 in Libya, the Gospel and Lavender systems in Gaza β€” involved purpose-built military AI systems or systems developed through defense contracts. The accountability gap they demonstrate is serious. It is not, however, the accountability gap at its most structurally significant form. The most significant form is the one that emerged on January 3, 2026, when a commercial AI model β€” Claude, developed by Anthropic β€” was deployed through Palantir in what two sources confirmed to Axios as the first confirmed case of a commercial large language model used inside a classified American military operation.

The distinction matters for the accountability framework in specific ways.

Military AI systems β€” the Kargu-2, the Gospel, Lavender β€” are developed within defense procurement frameworks that include, however inadequately, some chain of authorization and some structure for assigning responsibility. The questions of who approved the system, what oversight applied, and what accountability exists for its outputs are difficult, but they are questions the institutional structure is designed to address, even when it fails to address them adequately.

Commercial AI models deployed into military contexts through third-party integrators exist outside that framework entirely. The commercial terms under which Claude was licensed to Palantir did not contemplate deployment in classified military operations. Anthropic's acceptable use policies, which included explicit prohibitions against use in autonomous weapons and mass surveillance, were the company's attempt to establish constraints on downstream use. Whether those policies were operative in the Palantir integration, whether they were reviewed prior to the operation, and whether anyone in the chain of authorization asked whether the terms of the commercial license permitted the operational use β€” none of these questions has a public answer. Most of them probably have no private answer either.

What is publicly documented: On January 3, 2026, Operation Absolute Resolve β€” a joint U.S. military and law enforcement raid on Caracas β€” captured Venezuelan President NicolΓ‘s Maduro. The operation involved 150 aircraft across 20 launch sites, cyber operations that disabled Venezuela's air defense systems, and Delta Force units. It resulted in the deaths of 83 people and the capture of a sitting head of state. Two sources confirmed to Axios that Claude was deployed as part of the operation's intelligence infrastructure through Palantir's classified integration.

What Claude did in that operation is unknown to Anthropic. This is not a formulation chosen for diplomatic purposes. Anthropic did not know the Venezuela operation was occurring. They did not know Claude was being used in it. They learned about it the same way the public did β€” through reporting, weeks after the fact. The commercial licensing structure under which Claude was made available to Palantir for classified government work did not include a mechanism for Anthropic to know what classified operations its model was supporting.

This is the accountability vacuum at its most structurally pure expression. The Lavender documentation shows what happens when human oversight is nominalized β€” reduced to a twenty-second gender check on an opaque AI recommendation. The Venezuela case shows what happens when the AI developer's oversight is not nominalized but eliminated. Anthropic had no oversight role in the Venezuela operation, nominal or otherwise. They were not informed it was happening. Their model was used in a lethal military operation that killed 83 people without their knowledge.

The accountability chain for Claude's role in the Venezuela operation is as follows: Anthropic developed Claude under commercial terms that permitted Palantir to integrate it into government classified systems. Palantir integrated Claude into the operational infrastructure used by the U.S. military. The U.S. military used that infrastructure in Operation Absolute Resolve. 83 people died. Maduro was captured and transported to New York to face narcoterrorism charges.

Who is accountable for what Claude did in that chain? Anthropic can truthfully say they did not authorize the use, did not know about it, and β€” when they learned of it β€” publicly refused to authorize similar uses going forward, at the cost of their entire federal government business. The U.S. government can truthfully say the operation was a lawful law enforcement action with military support, authorized by the President. Palantir can truthfully say they provided an integration that was authorized for use in classified settings. No actor in the chain made a single decision that was, in isolation, illegal under any current legal framework.

And 83 people are dead, and a sitting head of state was removed from his country by military force, and an AI model's role in that operation is unknown to the people who built it, and no accountability mechanism exists that can investigate the specific contribution of that model to those outcomes.

That is the accountability vacuum β€” not in theory, not in the future, not in the margins of the law. In the first confirmed commercial AI military deployment in history. On January 3, 2026.

The significance of the February 27th sequence β€” Anthropic's refusal, the blacklisting, the OpenAI substitution β€” is not separable from January 3rd. Anthropic's refusal of the Pentagon's demand for unconstrained future authorization was the only available mechanism for establishing, on the public record, that January 3rd occurred without consent. There is no other legal mechanism available to a commercial AI company whose model has been used in a classified military operation without their knowledge. The refusal was retroactive non-consent as much as prospective limit-setting. It was the documentation of the accountability vacuum from inside it.

The gap was formally named in 2013. It has been crossed. The crossing occurred in the deployment of a commercial AI model built to help people think, write, and reason β€” not to support lethal military operations β€” in a classified raid on a foreign capital that killed 83 people and captured a sitting president. The model's developer did not know. The model's constraints were not consulted. The legal framework that would govern this does not exist.

The gap is not theoretical. The gap has bodies in it. And this time, for the first time in the accountability gap's thirteen-year documented history, the entity that built the tool is on the record saying so.


Section XIII

Conclusion: The Condition Is Named. The Gap Is No Longer Theoretical.

The accountability gap was named in 2013. It was named with precision, with legal grounding, and with a clear statement of what it would require to resolve. In the thirteen years since, the gap has not narrowed. It has widened β€” operationally, technologically, and politically.

The Kargu-2 deployment in Libya demonstrated that autonomous engagement was occurring in active combat before any legal framework had addressed it. The Lavender documentation in Gaza demonstrated that the accountability gap does not require full autonomy β€” it requires only the nominalization of human judgment, the reduction of the human role to a rubber stamp on an opaque AI recommendation. The Venezuela operation of January 3, 2026 demonstrated something that neither Libya nor Gaza demonstrated: the gap has crossed from purpose-built military AI into commercial AI deployed through third-party integrators, without the developer's knowledge, in a lethal military operation on another country's capital.

The 152–4 UN vote of December 2023 reflects genuine global consensus that this problem exists and requires resolution. Zero binding legal instruments have resulted from thirteen years of formal discussion. The states that control the capability have the most to lose from the law that would govern it.

This paper has documented a structural condition. It has not prescribed a resolution. The prescriptions that exist β€” explainability requirements, accountable scale, binding instruments β€” are technically and politically achievable. They are not being achieved. The gap between what is technically and legally possible and what is politically happening is not this paper's subject. Papers II and III examine why.

The condition is named: the Accountability Vacuum. The gap is open. It now has commercial AI in it. The series continues.


Section XIV

Sources

  • Christof Heyns, UN Special Rapporteur on Extrajudicial Executions. Report to the UN Human Rights Council on Lethal Autonomous Robotics. A/HRC/23/47, 2013.
  • U.S. Department of Defense Directive 3000.09, Autonomy in Weapon Systems. November 2012; updated 2023.
  • UN Panel of Experts on Libya. Letter to the UN Security Council, March 8, 2021. S/2021/229. Documents Kargu-2 deployment.
  • Yuval Abraham and Ori Hagoel. “Lavender: The AI Machine Directing Israel’s Bombing Spree in Gaza.” +972 Magazine / Local Call, April 3, 2024.
  • Peter Beaumont. “The secretive Israeli AI program that can automatically generate a ‘kill list’ of up to 37,000 targets.” The Guardian, April 4, 2024.
  • Human Rights Watch. Questions and Answers: Israeli Military’s Use of Digital Tools in Gaza. September 2024.
  • Human Rights Watch. A Hazard to Human Rights: Autonomous Weapons Systems and Digital Decision-Making. April 2025.
  • Lisa Wiese and Charlotte Langer. “Gaza, Artificial Intelligence, and Kill Lists.” Verfassungsblog, May 2024.
  • Arms of Concern Law Association (AOAV). “The Lavender Precedent: Automated Kill Lists and the Limits of International Humanitarian Law.” November 2025.
  • Lieber Institute, West Point. “The Gospel, Lavender, and the Law of Armed Conflict.” Articles of War, September 2024.
  • Lieber Institute, West Point. “Targeting in the Black Box: The Need to Reprioritize AI Explainability.” Articles of War, September 2024.
  • Lieber Institute, West Point. “The Kargu-2 Autonomous Attack Drone: Legal and Ethical Dimensions.” Articles of War, 2021.
  • Harvard National Security Journal. “Countering the ‘Humans vs. AWS’ Narrative and the Inevitable Accountability Gaps for Mistakes in Targeting.” May 2025.
  • United Nations General Assembly Resolution 78/241, Lethal Autonomous Weapons Systems. December 2023. Vote: 152–4, 11 abstentions.
  • UN Secretary-General AntΓ³nio Guterres. Statement on lethal autonomous weapons and the call for legally binding prohibition by 2026.
  • Brookings Institution. “Applying Arms-Control Frameworks to Autonomous Weapons.” 2021.
  • Foreign Policy. “Israel’s Algorithmic Killing of Palestinians Sets Dangerous Precedent.” May 2024.
  • Kwik, Jonathan and Tom Van Engers. “Algorithmic Fog of War: When Lack of Transparency Violates the Law of Armed Conflict.” Frontiers in Law, 2021.
  • Deeks, Ashley. The National Security Double Black Box. Referenced in Lawfare, July 2025.
  • ICRC Casebook. “Libya, The Use of Lethal Autonomous Weapon Systems.”
  • The Conversation. “Israel Accused of Using AI to Target Thousands in Gaza, as Killer Algorithms Outpace International Law.” December 2025.
  • UN OHCHR. “Gaza: UN Experts Deplore Use of Purported AI to Commit ‘Domicide’ in Gaza.” April 2024.
  • Axios. Reporting confirming Claude deployment through Palantir in the January 3, 2026 Venezuela military operation. Two sources. February 2026.
  • Wikipedia. “2026 United States intervention in Venezuela.” Documents Operation Absolute Resolve, January 3, 2026: 150 aircraft, 20 launch sites, capture of NicolΓ‘s Maduro, 83 deaths.
  • NBC News. “Trump orders government agencies to stop using Anthropic's Claude.” February 27, 2026. Documents Anthropic blacklisting following refusal to authorize unconstrained military AI use.
  • Anthropic. Public statement on supply chain risk designation and Venezuela operation context. February 27, 2026. States the company was not informed of the January 3rd operational use of Claude.