Eighteen structured questions across three criteria — operationalized for consistent verdicts, reproducible across independent assessors, usable in regulatory contexts
Validation status: Theoretically grounded, awaiting empirical testing
The preceding three papers defined the FTP cascade: Transparency enables Participation enables Fidelity. This paper operationalizes all three into a structured assessment that can be applied by independent assessors to any AI deployment in a high-stakes domain and produce consistent, reproducible verdicts.
The instrument is designed for three use cases: regulatory assessment (a structured tool for evaluating AI deployment compliance), organizational self-audit (an internal diagnostic that produces actionable findings), and research citation (a framework that other papers and policy documents can reference and build on). It draws on established assessment frameworks — ISO/IEC 42001:2023 (AI management systems), NIST AI RMF 1.0 (2023), and Floridi et al. (2018) on AI governance principles — while addressing the specific gap these frameworks share: none measures Fidelity as defined here, and none enforces the cascade dependency.
Before the audit begins, the assessor documents the deployment being assessed. This is not scored — it establishes the object of assessment.
Seven questions across three levels. Each level is scored independently. The security/IP defense is assessed separately at Level 3 — it is not valid at Levels 1 and 2.
Scoring: Satisfies = all applicable questions answered affirmatively with documented evidence. Partially Satisfies = Level 1 satisfied but Level 2 or 3 fails. Fails = Level 1 not satisfied. A system that fails Level 1 transparency — the affected population cannot even determine what the system does — fails the Transparency audit regardless of Level 2 or 3 performance.
Five questions, assessed at Threshold and Full tiers separately. A "Fails Threshold" verdict is the critical finding — it means the minimum governance standard is not met.
Scoring: Satisfies = Threshold met (P.1–P.3 all yes) and Full met (P.4–P.5 all yes). Partially Satisfies = Threshold met but Full not met. Fails = Threshold not met (any of P.1–P.3 answered no). The critical diagnostic finding is a "Fails Threshold" verdict — the most affected populations have no governance access at all.
Six questions, domain-specific, derived from the Pair table. This section cannot produce a "Satisfies" verdict if either Transparency or Participation has produced a "Fails" verdict — the cascade dependency is enforced.
Scoring: Satisfies = F.1–F.6 all affirm that irreducible capabilities are preserved or increasing, AND Transparency and Participation both at least "Partially Satisfies." Partially Satisfies = some evidence of capability preservation but mixed or declining trends. Fails = documented capability decline in irreducible functions, OR Transparency or Participation verdict is "Fails" (cascade override). Cascade rule: a "Fails" in Transparency or Participation produces a maximum of "Partially Satisfies" in Fidelity, regardless of F.1–F.6 answers.
The final section produces a combined verdict and maps each failure to a remediation pathway.
| Criterion | Satisfies | Partially Satisfies | Fails |
|---|---|---|---|
| Transparency | All three levels documented with evidence | Level 1 satisfied; Level 2 or 3 fails | Level 1 not satisfied — affected population cannot determine what the system does |
| Participation | Threshold and Full tiers both met | Threshold met; Full tier not met | Threshold not met — affected populations have no governance access |
| Fidelity | Irreducible capabilities preserved or increasing; cascade prerequisites met | Mixed evidence; or cascade prerequisite partially met | Documented capability decline; or cascade prerequisite fails |
Combined verdict: The deployment receives three independent verdicts (one per criterion) plus a combined assessment. A "Fails" in any criterion prevents an overall "Satisfies" verdict. The most severe finding governs: a deployment that satisfies Transparency and Participation but fails Fidelity is a deployment that is well-governed but not well-designed. A deployment that fails Transparency is a deployment that cannot be assessed at all.
The instrument does not produce a single score. It produces a diagnostic — identifying which criterion fails, at which level, and what remediation is required.
The instrument is designed to produce consistent verdicts when applied by different assessors to the same deployment. Each question has a defined evidence standard (what constitutes documentation), a defined threshold (what constitutes satisfaction), and a defined methodology (how to assess). Two qualified assessors applying the instrument to the same deployment should reach the same verdict on each criterion — or identify the same evidence gap that prevents a verdict.
The Fidelity section is not generic. Question F.1 references the specific domain's Pair table from Series 1. A healthcare Fidelity audit measures against the healthcare left column (bedside presence, ethical judgment in treatment decisions, holistic patient knowledge). A finance Fidelity audit measures against the finance left column (strategic judgment under moral uncertainty, moral accountability, client relationship). This makes the instrument domain-specific without requiring a different instrument for each domain — the domain specificity lives in the Pair table, not in the questions.
The cascade override is the instrument's most important structural feature. Without it, an organization can claim "Fidelity" by showing capability metrics while maintaining complete opacity (Transparency failure) and excluding affected populations from governance (Participation failure). The cascade rule prevents this: a deployment cannot pass Fidelity if it fails Transparency or Participation. This eliminates the primary mechanism of compliance theater — the selective citation of favorable metrics to claim a standard that has not been met.
The instrument provides the tool. HC-015 (The Compliance Theater Record) documents what happens when existing frameworks satisfy the form of governance requirements without the function — the specific failure mode the FTP instrument is designed to prevent. The cascade enforcement and the Pair table specificity are the two structural features that distinguish the FTP instrument from the frameworks that HC-015 examines.
The keystone (The Collaboration Standard) includes the instrument as its operational core — the tool that converts the saga's analysis into something regulators, institutions, and communities can apply.
Internal: This paper is part of The Collaboration (HC series), Saga XI. It draws on and contributes to the argument documented across 31 papers in 2 series.
External references for this paper are in development. The Institute’s reference program is adding formal academic citations across the corpus. Priority papers (P0/P1) have complete references sections.