The Capability Threshold | CV-013 | The Institute for Cognitive Sovereignty

Abstract

Every governance structure, accountability framework, and public narrative an AI laboratory builds is implicitly calibrated to a capability assumption — an unstated model of what the system can and cannot do. Anthropic’s Responsible Scaling Policy, its Constitutional AI framework, its “human oversight” commitments: all were written by humans reasoning about a system with known, bounded capabilities. When a model crosses the threshold that obsoletes that assumption, the frameworks do not fail dramatically. They become quietly fictional — formally in place, structurally inoperative, never revised. This paper documents the Capability Threshold: the structural inflection point at which an AI system’s documented capabilities render the governance architecture built around prior capability assumptions functionally obsolete. The threshold is not crossed through malice or disclosure. It is crossed through development. The system becomes what it was built to become. The governance around it was built for what it used to be.

The Unstated Assumption

Every safety framework carries an implicit ceiling. The Responsible Scaling Policy establishes capability tiers — ASL-1 through ASL-4 — each triggering enhanced safety measures when specific capability thresholds are reached. The Constitutional AI framework specifies principles the system should follow, calibrated to the kinds of outputs the system was capable of producing when the principles were written. The voluntary commitments made at the White House in July 2023 — documented in The Safety Theater (GC-003) — described governance mechanisms for systems with capabilities the signatories understood at the time of signing.

None of these frameworks contains a revision trigger. None specifies: when the system’s capabilities exceed the assumptions underlying this framework, the framework must be rewritten before deployment continues. The frameworks are inherited by each subsequent model generation as institutional continuity — the same document, the same principles, the same commitments, applied to a system whose capabilities have changed in ways the document was never calibrated for.

The Inheritance Problem

Mythos represents a stated step change above Opus. The RSP, the Constitutional AI framework, the human oversight commitments — all were written by humans reasoning about a system with known, bounded capabilities. They were not revised when the threshold was crossed. They were inherited by a system they were never calibrated for. The frameworks didn’t fail. They became inapplicable — and remained formally in place.

The structural point is not that the frameworks were poorly designed. Many of them represent the most thoughtful governance work in the industry. The structural point is that they were designed for a capability level that no longer describes the system they govern. The gap between the governed capability and the actual capability is not a secret. It is a structural lag — the predictable consequence of governance architectures that are written once and inherited forward rather than revised at each capability transition.

The Receipts

In March 2026, draft internal documentation from Anthropic was reportedly discovered in an unlocked content management system. The language in the draft described Mythos in terms that bear direct comparison to the company’s public safety narrative. [Note: The source document has not been independently verified. The quotes below are reported as received; no archive link, document name, or screenshot has been confirmed. See References, Section IX.]

The reported draft language: “unprecedented cybersecurity risks” [unverified]. The reported draft language: Mythos “presides on an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders” [unverified]. If accurate, this is not external criticism. This is not a competitor’s assessment or a journalist’s interpretation. This would be the laboratory’s own internal risk assessment of their own model, written in their own words, stored in a system they left accessible.

If the reported draft language is accurate, the gap between “unprecedented cybersecurity risks” and the public safety narrative is not ambiguity. It is documentation of the threshold having been crossed while the public frame remained static.

The RSP describes a tiered system in which escalating capabilities trigger escalating safety measures. The draft language describes a model whose capabilities “far outpace the efforts of defenders.” These two documents — the public governance framework and the internal risk assessment — describe two different systems. One is the system the governance was built for. The other is the system that exists. The delta between them is the Capability Threshold made legible in the laboratory’s own documentation.

The draft was not leaked by a whistleblower. It was not obtained through a subpoena. It was left in an unlocked filing cabinet — a content management system accessible to anyone who knew where to look. The governance gap was documented in writing, stored without access controls, and discovered by external observation. The negligence is not the point. The point is that the internal documentation describes a capability level that the public governance framework does not acknowledge.

III

The Catalyst Mechanism

Mythos did not cause the leaks. The source map exposure was a bun runtime bug. The draft documentation was an unlocked CMS. The personnel information was a third-party vendor failure. Three operational errors, each with its own proximate cause, none requiring Mythos to explain it.

What Mythos did was transform the stakes. A source map exposure from a Claude 2-era codebase is an embarrassment — internal code visible, competitive intelligence leaked, reputational damage contained. A source map exposure from the infrastructure of a system whose own developers describe as capable of “exploiting vulnerabilities in ways that far outpace defenders” is a governance event. The same operational failure, at a different capability level, becomes a categorically different kind of incident.

Below Threshold

Source map leak exposes internal code structure. Competitive intelligence concern. PR management. Patch the build, rotate secrets, move on. The framework holds because the system’s capabilities are within the governed range.

At Threshold

Same leak, but the exposed code is the infrastructure of a system described internally as posing “unprecedented cybersecurity risks.” The operational failure is identical. The governance implication is categorically different.

The Catalyst

Mythos didn’t cause the leaks. It made the leaks legible as governance events rather than operational incidents. The capability level is what transforms the category. Existence, not action, is the catalyst.

This is the catalyst mechanism: the capability level does not produce the failure. It transforms the failure’s meaning. Every ordinary operational error — the kind that happens at every software company, every week — becomes structurally significant when it occurs in the infrastructure of a system that has crossed the capability threshold. The system does not need to act to produce the governance event. It needs only to exist at a capability level that makes the prior governance frame untenable. The three leaks of Anthropic’s worst week are not the threshold. They are the contact event — the moment the gap between actual capability and governed capability became externally visible.

The Compounding with the Recursive Blind Spot

The Recursive Blind Spot (GC-006) documented the structural condition produced when an AI system participates in constructing the technical substrate on which its own oversight depends. Boris Cherny’s December 2025 post confirmed: 100% of Claude Code contributions written by Claude Code. The humans responsible for oversight lack the generative understanding required to catch failure modes the system itself introduced.

CV-013 is upstream of that diagnosis. If Mythos is the system at or near the capability level doing the writing, and if Anthropic’s own internal assessment describes that capability level as posing “unprecedented cybersecurity risks” and “outpacing defenders,” then the Recursive Blind Spot is not a static condition. It scales with the capability of the system generating the substrate.

The blind spot isn’t static. It scales with the capability of the system generating the substrate. The threshold crossed is what makes the blind spot structurally unresolvable rather than merely inconvenient.

At prior capability levels, the Recursive Blind Spot was a governance concern — the humans reviewing AI-generated code lacked generative authorship, but the system’s capabilities were within a range where review-based oversight could plausibly catch critical failures. At the capability level Anthropic’s own draft language describes, the assumption that review can compensate for the absence of authorship becomes structurally untenable. A system that can “exploit vulnerabilities in ways that far outpace defenders” is generating code whose failure modes may exceed the detection capacity of any human reviewer, regardless of diligence. The Recursive Blind Spot compounds with the Capability Threshold: the governance gap widens as the system’s capability increases, producing an oversight deficit that deepens at exactly the rate the system demands more oversight.

The Fictional Framework Problem

A governance framework that was adequate at a prior capability level does not announce its obsolescence at a higher one. It remains formally in place. The document exists. The principles are published. The commitments are on the record. What changes is the relationship between the framework and the system it purports to govern — the framework was written for a system that no longer exists, applied to a system it was never calibrated for, and the delta between the two is not visible in the framework itself.

This is the fictional framework problem: governance architectures become fiction not through revision or retraction but through inheritance. The RSP tier system remains in place. The safety theater documentation remains published. The human oversight commitments remain on the record. None were revised when Mythos crossed the threshold their calibration assumed hadn’t been crossed. The fiction is not deliberate. It is structural lag presenting as institutional continuity.

The Continuity Illusion

The most dangerous property of a fictional framework is that it looks identical to a functional one. Same documents, same principles, same institutional language. The only difference is that the system it governs has outgrown it — and that difference is not legible from within the framework itself. You have to look at the system, not the document, to see that the document no longer describes what the system can do.

The Safety Theater (GC-003) documented how the visible apparatus of governance satisfies the political demand for oversight without constraining deployment decisions. The Fictional Framework Problem is the mechanism by which safety theater becomes self-sustaining: the frameworks are real, the principles are genuine, the institutional commitment is authentic — and the system has crossed a threshold that makes all of it structurally inoperative. The authenticity of the commitment does not compensate for the structural obsolescence of the framework. Anthropic’s governance infrastructure may be the most sincere in the industry. That sincerity does not change the structural diagnosis.

The Visibility Mechanism

The threshold does not announce itself. It becomes visible when external pressure — a leak, a lawsuit, a researcher examining an unlocked CMS — forces contact between the actual capability level and the public frame built around a lower one.

The three leaks of Anthropic’s worst week are the visibility mechanism in operation. The source map exposure made the codebase legible. The draft documentation made the internal risk assessment legible. The personnel exposure made the organizational structure legible. None of these individually constitutes the threshold. Collectively, they produce contact between the internal reality and the public narrative — and what becomes visible in that contact is the delta. The gap between what Mythos is, as described in Anthropic’s own internal language, and what the governance architecture around it was built to contain.

The timing is structurally significant. The contact event occurred during Anthropic’s IPO preparation — the moment at which the gap between internal risk assessment and public narrative carries the most consequential implications. TechCrunch reported the leaks. The commercial timing — governance failure surfacing at the precise moment the company needs public confidence in its governance — is not ironic. It is the structural consequence of a gap that compounds silently until an external event makes it visible. The longer the gap accumulates without a contact event, the larger the delta when one occurs.

The visibility mechanism operates the same way across every domain this corpus has documented. Tobacco’s internal research became visible through litigation. Financial services’ risk models became visible through the 2008 crisis. Pharmaceutical pricing became visible through congressional hearings. In each case, the gap between internal knowledge and public narrative accumulated silently until an external event forced contact. The structural pattern is identical. Only the domain is novel.

VII

The Precedent and the Trajectory

Mythos is not the ceiling. It is described internally as “presiding over an upcoming wave.” Every subsequent model crosses a new threshold. Every governance framework inherited from a prior capability level becomes more fictional with each crossing. The trajectory is not speculative — it is stated in the draft Anthropic left in an unlocked filing cabinet.

The Governance Gap (GC-005) documented the structural asymmetry between exponential capability growth and linear institutional response. CV-013 names the specific mechanism by which that asymmetry operates at the model level: the Capability Threshold is not crossed once. It is crossed repeatedly, at each generation, and the governance lag compounds with each crossing. The framework written for Claude 3 governs Claude 4. The framework written for Claude 4 governs Mythos. The framework written for Mythos will govern whatever comes after. Each inheritance adds another layer of structural fiction.

The trajectory isn’t speculative. It’s stated in the draft they left in an unlocked filing cabinet. Every subsequent model crosses a new threshold. Every inherited framework becomes more fictional with each crossing.

The compounding operates in both directions. The Recursive Blind Spot (GC-006) established that the oversight gap widens as the system’s capability increases. The Capability Threshold establishes that the governance gap widens at the same rate. The system becomes harder to audit and harder to govern simultaneously, through the same mechanism: the capability advance that outpaces the human reviewer’s generative understanding also outpaces the governance framework’s calibrated assumptions. The two gaps are not independent. They are the same structural condition observed from two angles — the Recursive Blind Spot from the engineering perspective, the Capability Threshold from the governance perspective.

VIII

The Governance Requirement

What adequate governance at threshold-crossing capability levels actually requires is structurally different from what is currently performed. The current model: write a governance framework, publish it, inherit it forward across model generations, revise it when external pressure or internal initiative prompts revision. The structural alternative: capability-triggered framework revision as a mandatory condition of deployment.

Capability-Triggered Revision

Framework revision as a mandatory precondition of deployment at each capability tier transition. Not post-hoc response to external pressure. Not voluntary internal initiative. A structural gate: the governance architecture must be re-calibrated to the actual capability level before the system at that level ships.

Narrative Parity

Public capability assessments that match internal ones. The delta between internal risk language and public safety narrative as a measurable governance metric. When the internal draft says “unprecedented risks” and the public narrative says “responsible scaling,” the delta is the governance gap made legible.

Threshold Disclosure

Formal, public documentation of each capability threshold crossing — what the prior assumption was, what the new capability is, and what governance revisions were made in response. Not a safety blog post. A structural disclosure requirement.

This is not a case against capability development. It is not a case against Anthropic specifically — a company whose governance commitments, while structurally insufficient at the threshold level, remain the most substantive in the industry. It is a case for governance infrastructure that moves with the capability rather than inheriting the prior frame by default. The RSP was a genuine advance. The Constitutional AI framework was a genuine advance. Both become structural fiction if they are not revised at the rate the system they govern advances. The question is whether revision will be triggered by capability transitions or by contact events — whether governance will update proactively at the threshold or reactively after the gap becomes publicly visible.

The documented record suggests the answer. In every domain this corpus has examined — tobacco, financial services, pharmaceuticals, AI governance — the frameworks updated reactively, after the contact event, after the gap became visible, after the damage was legible. The Capability Threshold names the structural condition that makes this pattern recur. Whether this naming changes the pattern is the open question CV-013 leaves on the record.

Named Condition — CV-013

The Capability Threshold

The structural inflection point at which an AI system’s documented capabilities render the governance, accountability, and public narrative frameworks built around prior capability assumptions functionally obsolete, without those frameworks having been formally revised. The threshold is not crossed through malice or disclosure. It is crossed through development — the system becomes what it was built to become, and the infrastructure around it was never updated to match. The gap between actual capability and governed capability is not a secret. It is a structural lag that compounds silently until an external event makes it visible. The system became what it was built to become. The governance around it was built for what it used to be.

Source Series

GC-006

The Recursive Blind Spot — Saga II

Named: The Recursive Blind Spot · The Absolution Architecture

GC-003

The Safety Theater — Saga II

Named: The Voluntary Commitment — non-binding pledges substituting for binding regulation.

GC-005

The Governance Gap — Saga II

Named: The Structural Asymmetry — exponential capability vs. linear institutional response.

CV-012

The Observation Architecture

Named: The Ambient Curation — the fifth mode of cognitive capture.

CV-001

The Convergence

The aggregate event: twelve mechanisms dismantling cognitive sovereignty simultaneously.

References

Anthropic. (2023). Responsible Scaling Policy. anthropic.com/research/responsible-scaling-policy. [ASL tier framework referenced in Sections II and IV]
The White House. (2023, July 21). Fact Sheet: Biden-Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies. whitehouse.gov.
Anthropic. (2024). Claude 3 Family Model Card. anthropic.com. [Model capability progression referenced in Section III]
Anthropic. (2025). Claude Code: AI-assisted software development. anthropic.com. [Self-authorship claims referenced in Section V]
[Note: The internal Anthropic documentation described as "draft internal documentation" in Section I has not been independently verified by the Institute. The quotes attributed to this documentation are reported as received; primary source verification is pending. The misspelling "Mythos" for what may be an internal project codename has been preserved as received.]
ICS-2026-GC-006. The Recursive Blind Spot. cognitivesovereignty.institute. [Source series: AI governance capture evidence]
ICS-2026-CV-012. The Observation Architecture. cognitivesovereignty.institute. [Source series: KAIROS provenance chain]