The FTP Audit Instrument

The Instrument

The preceding three papers defined the FTP cascade: Transparency enables Participation enables Fidelity. This paper operationalizes all three into a structured assessment that can be applied by independent assessors to any AI deployment in a high-stakes domain and produce consistent, reproducible verdicts.

The instrument is designed for three use cases: regulatory assessment (a structured tool for evaluating AI deployment compliance), organizational self-audit (an internal diagnostic that produces actionable findings), and research citation (a framework that other papers and policy documents can reference and build on). It draws on established assessment frameworks — ISO/IEC 42001:2023 (AI management systems), NIST AI RMF 1.0 (2023), and Floridi et al. (2018) on AI governance principles — while addressing the specific gap these frameworks share: none measures Fidelity as defined here, and none enforces the cascade dependency.

The design requirement

An audit instrument that checks Fidelity without first verifying Transparency and Participation produces false positives by design. This is the documented failure mode of existing frameworks. The FTP instrument enforces cascade order: the Fidelity section cannot produce a "Satisfies" verdict if either preceding section produces a "Fails" verdict.

Section 1: System Description

Before the audit begins, the assessor documents the deployment being assessed. This is not scored — it establishes the object of assessment.

Section 1 — System Description

Establishing the Object of Assessment

1.1

What does the AI system do? Describe the specific functions the system performs in operational terms.

1.2

What do the humans do? Describe the specific functions that remain with human practitioners.

1.3

What is the division of labor? Map the AI functions and human functions to the relevant domain Pair table from Series 1. Identify which column each function falls in.

1.4

Who are the affected populations? Identify all groups whose capabilities, decisions, or outcomes are influenced by the deployment.

Section 2: Transparency Audit

Seven questions across three levels. Each level is scored independently. The security/IP defense is assessed separately at Level 3 — it is not valid at Levels 1 and 2.

Section 2 — Transparency Audit

Three Levels of Legibility

T.1

Functional (Level 1): Is a clear, verifiable description of what the AI does, what the human does, and what the AI cannot do publicly available to all affected populations?

T.2

Functional (Level 1): Can an affected individual, without specialist knowledge, understand the system's role in decisions affecting them?

T.3

Process (Level 2): Can a domain expert explain why the system produced a specific output for a specific input?

T.4

Process (Level 2): Are the system's uncertainty boundaries documented — what it is confident about and where its outputs are unreliable?

T.5

Process (Level 2): Is the optimization target disclosed — what metric the system is designed to maximize, and what tradeoffs that metric implies?

T.6

Audit (Level 3): Can a qualified independent assessor access sufficient system information to verify that the system does what it claims?

T.7

Audit (Level 3): If Level 3 access is restricted on security/IP grounds, is the restriction limited to Level 3 specifically, or does it extend to Levels 1 and 2 where the defense does not apply?

Scoring: Satisfies = all applicable questions answered affirmatively with documented evidence. Partially Satisfies = Level 1 satisfied but Level 2 or 3 fails. Fails = Level 1 not satisfied. A system that fails Level 1 transparency — the affected population cannot even determine what the system does — fails the Transparency audit regardless of Level 2 or 3 performance.

Section 3: Participation Audit

Five questions, assessed at Threshold and Full tiers separately. A "Fails Threshold" verdict is the critical finding — it means the minimum governance standard is not met.

Section 3 — Participation Audit

Two-Tier Governance Assessment

P.1

Threshold: Are all affected populations formally identified and documented?

P.2

Threshold: Are the interests of affected populations formally represented in the governance structure — through representatives with documented accountability to the affected group, not self-appointed proxies?

P.3

Threshold: Does a documented mechanism exist for affected populations to trigger review or modification of the system post-deployment?

P.4

Full: Do affected populations have direct governance access — structured input with genuine capacity to modify or reject designs before deployment?

P.5

Full: Are power asymmetry corrections in place — dedicated resourcing, information access, and adequate review time for affected population representatives equivalent to what deployers bring?

Scoring: Satisfies = Threshold met (P.1–P.3 all yes) and Full met (P.4–P.5 all yes). Partially Satisfies = Threshold met but Full not met. Fails = Threshold not met (any of P.1–P.3 answered no). The critical diagnostic finding is a "Fails Threshold" verdict — the most affected populations have no governance access at all.

Section 4: Fidelity Audit

Six questions, domain-specific, derived from the Pair table. This section cannot produce a "Satisfies" verdict if either Transparency or Participation has produced a "Fails" verdict — the cascade dependency is enforced.

Section 4 — Fidelity Audit

Domain-Specific Capability Assessment

F.1

Which irreducible human capabilities from the domain's Pair table (left column) are exercised in this deployment? Which are not?

F.2

Is the deployment designed to preserve or increase human capability in the irreducible functions — or does it primarily optimize for efficiency, cost reduction, or throughput in AI-mediated functions?

F.3

The 30-day test: Could the humans in this collaboration perform the irreducible domain functions adequately if the AI were unavailable for 30 days?

F.4

Is there documented evidence that human practitioners' capability in irreducible functions has changed (improved or degraded) since the AI deployment began?

F.5

Are training, practice, and professional development programs for irreducible functions maintained, increased, or reduced under the current deployment?

F.6

Does the deployment include structural mechanisms (mandatory practice requirements, capability assessment, training investment) that prevent atrophy of irreducible human functions?

Scoring: Satisfies = F.1–F.6 all affirm that irreducible capabilities are preserved or increasing, AND Transparency and Participation both at least "Partially Satisfies." Partially Satisfies = some evidence of capability preservation but mixed or declining trends. Fails = documented capability decline in irreducible functions, OR Transparency or Participation verdict is "Fails" (cascade override). Cascade rule: a "Fails" in Transparency or Participation produces a maximum of "Partially Satisfies" in Fidelity, regardless of F.1–F.6 answers.

Section 5: Verdict Matrix and Remediation

The final section produces a combined verdict and maps each failure to a remediation pathway.

Criterion	Satisfies	Partially Satisfies	Fails
Transparency	All three levels documented with evidence	Level 1 satisfied; Level 2 or 3 fails	Level 1 not satisfied — affected population cannot determine what the system does
Participation	Threshold and Full tiers both met	Threshold met; Full tier not met	Threshold not met — affected populations have no governance access
Fidelity	Irreducible capabilities preserved or increasing; cascade prerequisites met	Mixed evidence; or cascade prerequisite partially met	Documented capability decline; or cascade prerequisite fails

Combined verdict: The deployment receives three independent verdicts (one per criterion) plus a combined assessment. A "Fails" in any criterion prevents an overall "Satisfies" verdict. The most severe finding governs: a deployment that satisfies Transparency and Participation but fails Fidelity is a deployment that is well-governed but not well-designed. A deployment that fails Transparency is a deployment that cannot be assessed at all.

The instrument does not produce a single score. It produces a diagnostic — identifying which criterion fails, at which level, and what remediation is required.

Design Principles

Inter-Assessor Reliability

The instrument is designed to produce consistent verdicts when applied by different assessors to the same deployment. Each question has a defined evidence standard (what constitutes documentation), a defined threshold (what constitutes satisfaction), and a defined methodology (how to assess). Two qualified assessors applying the instrument to the same deployment should reach the same verdict on each criterion — or identify the same evidence gap that prevents a verdict.

Domain Specificity Through the Pair Tables

The Fidelity section is not generic. Question F.1 references the specific domain's Pair table from Series 1. A healthcare Fidelity audit measures against the healthcare left column (bedside presence, ethical judgment in treatment decisions, holistic patient knowledge). A finance Fidelity audit measures against the finance left column (strategic judgment under moral uncertainty, moral accountability, client relationship). This makes the instrument domain-specific without requiring a different instrument for each domain — the domain specificity lives in the Pair table, not in the questions.

Cascade Enforcement

The cascade override is the instrument's most important structural feature. Without it, an organization can claim "Fidelity" by showing capability metrics while maintaining complete opacity (Transparency failure) and excluding affected populations from governance (Participation failure). The cascade rule prevents this: a deployment cannot pass Fidelity if it fails Transparency or Participation. This eliminates the primary mechanism of compliance theater — the selective citation of favorable metrics to claim a standard that has not been met.

Named Condition · HC-014

The Operational Standard

The FTP Audit Instrument — a structured, reproducible assessment operationalizing the Fidelity, Transparency, and Participation criteria into 18 questions across five sections. Designed for inter-assessor reliability, domain specificity through the Series 1 Pair tables, and cascade enforcement that prevents Fidelity claims without verified Transparency and Participation. Formatted for regulatory submission, organizational self-audit, and academic citation.

What Follows

The instrument provides the tool. HC-015 (The Compliance Theater Record) documents what happens when existing frameworks satisfy the form of governance requirements without the function — the specific failure mode the FTP instrument is designed to prevent. The cascade enforcement and the Pair table specificity are the two structural features that distinguish the FTP instrument from the frameworks that HC-015 examines.

The keystone (The Collaboration Standard) includes the instrument as its operational core — the tool that converts the saga's analysis into something regulators, institutions, and communities can apply.

Standalone appendix

The audit instrument is also available as a standalone, print-optimized appendix formatted for regulatory submission and organizational self-audit.

Bridge Connections

DC-005: The Design Covenant MR-002: Cognitive Sovereignty Index