HC-014 · The FTP Framework · Saga XI: The Collaboration

The FTP Audit Instrument

Eighteen structured questions across three criteria — operationalized for consistent verdicts, reproducible across independent assessors, usable in regulatory contexts

Validation status: Theoretically grounded, awaiting empirical testing

The Operational Standard Saga XI: The Collaboration 22 min read Open Access CC BY-SA 4.0
18
structured questions across three audit sections — producing consistent, reproducible verdicts
3
verdict levels — Satisfies, Partially Satisfies, Fails — with documented evidence required for each
5
instrument sections — system description, transparency audit, participation audit, fidelity audit, verdict and remediation

The Instrument

The preceding three papers defined the FTP cascade: Transparency enables Participation enables Fidelity. This paper operationalizes all three into a structured assessment that can be applied by independent assessors to any AI deployment in a high-stakes domain and produce consistent, reproducible verdicts.

The instrument is designed for three use cases: regulatory assessment (a structured tool for evaluating AI deployment compliance), organizational self-audit (an internal diagnostic that produces actionable findings), and research citation (a framework that other papers and policy documents can reference and build on). It draws on established assessment frameworks — ISO/IEC 42001:2023 (AI management systems), NIST AI RMF 1.0 (2023), and Floridi et al. (2018) on AI governance principles — while addressing the specific gap these frameworks share: none measures Fidelity as defined here, and none enforces the cascade dependency.

The design requirement
An audit instrument that checks Fidelity without first verifying Transparency and Participation produces false positives by design. This is the documented failure mode of existing frameworks. The FTP instrument enforces cascade order: the Fidelity section cannot produce a "Satisfies" verdict if either preceding section produces a "Fails" verdict.

Section 1: System Description

Before the audit begins, the assessor documents the deployment being assessed. This is not scored — it establishes the object of assessment.

Establishing the Object of Assessment
1.1
What does the AI system do? Describe the specific functions the system performs in operational terms.
1.2
What do the humans do? Describe the specific functions that remain with human practitioners.
1.3
What is the division of labor? Map the AI functions and human functions to the relevant domain Pair table from Series 1. Identify which column each function falls in.
1.4
Who are the affected populations? Identify all groups whose capabilities, decisions, or outcomes are influenced by the deployment.

Section 2: Transparency Audit

Seven questions across three levels. Each level is scored independently. The security/IP defense is assessed separately at Level 3 — it is not valid at Levels 1 and 2.

Three Levels of Legibility
T.1
Functional (Level 1): Is a clear, verifiable description of what the AI does, what the human does, and what the AI cannot do publicly available to all affected populations?
T.2
Functional (Level 1): Can an affected individual, without specialist knowledge, understand the system's role in decisions affecting them?
T.3
Process (Level 2): Can a domain expert explain why the system produced a specific output for a specific input?
T.4
Process (Level 2): Are the system's uncertainty boundaries documented — what it is confident about and where its outputs are unreliable?
T.5
Process (Level 2): Is the optimization target disclosed — what metric the system is designed to maximize, and what tradeoffs that metric implies?
T.6
Audit (Level 3): Can a qualified independent assessor access sufficient system information to verify that the system does what it claims?
T.7
Audit (Level 3): If Level 3 access is restricted on security/IP grounds, is the restriction limited to Level 3 specifically, or does it extend to Levels 1 and 2 where the defense does not apply?

Scoring: Satisfies = all applicable questions answered affirmatively with documented evidence. Partially Satisfies = Level 1 satisfied but Level 2 or 3 fails. Fails = Level 1 not satisfied. A system that fails Level 1 transparency — the affected population cannot even determine what the system does — fails the Transparency audit regardless of Level 2 or 3 performance.

Section 3: Participation Audit

Five questions, assessed at Threshold and Full tiers separately. A "Fails Threshold" verdict is the critical finding — it means the minimum governance standard is not met.

Two-Tier Governance Assessment
P.1
Threshold: Are all affected populations formally identified and documented?
P.2
Threshold: Are the interests of affected populations formally represented in the governance structure — through representatives with documented accountability to the affected group, not self-appointed proxies?
P.3
Threshold: Does a documented mechanism exist for affected populations to trigger review or modification of the system post-deployment?
P.4
Full: Do affected populations have direct governance access — structured input with genuine capacity to modify or reject designs before deployment?
P.5
Full: Are power asymmetry corrections in place — dedicated resourcing, information access, and adequate review time for affected population representatives equivalent to what deployers bring?

Scoring: Satisfies = Threshold met (P.1–P.3 all yes) and Full met (P.4–P.5 all yes). Partially Satisfies = Threshold met but Full not met. Fails = Threshold not met (any of P.1–P.3 answered no). The critical diagnostic finding is a "Fails Threshold" verdict — the most affected populations have no governance access at all.

Section 4: Fidelity Audit

Six questions, domain-specific, derived from the Pair table. This section cannot produce a "Satisfies" verdict if either Transparency or Participation has produced a "Fails" verdict — the cascade dependency is enforced.

Domain-Specific Capability Assessment
F.1
Which irreducible human capabilities from the domain's Pair table (left column) are exercised in this deployment? Which are not?
F.2
Is the deployment designed to preserve or increase human capability in the irreducible functions — or does it primarily optimize for efficiency, cost reduction, or throughput in AI-mediated functions?
F.3
The 30-day test: Could the humans in this collaboration perform the irreducible domain functions adequately if the AI were unavailable for 30 days?
F.4
Is there documented evidence that human practitioners' capability in irreducible functions has changed (improved or degraded) since the AI deployment began?
F.5
Are training, practice, and professional development programs for irreducible functions maintained, increased, or reduced under the current deployment?
F.6
Does the deployment include structural mechanisms (mandatory practice requirements, capability assessment, training investment) that prevent atrophy of irreducible human functions?

Scoring: Satisfies = F.1–F.6 all affirm that irreducible capabilities are preserved or increasing, AND Transparency and Participation both at least "Partially Satisfies." Partially Satisfies = some evidence of capability preservation but mixed or declining trends. Fails = documented capability decline in irreducible functions, OR Transparency or Participation verdict is "Fails" (cascade override). Cascade rule: a "Fails" in Transparency or Participation produces a maximum of "Partially Satisfies" in Fidelity, regardless of F.1–F.6 answers.

Section 5: Verdict Matrix and Remediation

The final section produces a combined verdict and maps each failure to a remediation pathway.

Criterion Satisfies Partially Satisfies Fails
Transparency All three levels documented with evidence Level 1 satisfied; Level 2 or 3 fails Level 1 not satisfied — affected population cannot determine what the system does
Participation Threshold and Full tiers both met Threshold met; Full tier not met Threshold not met — affected populations have no governance access
Fidelity Irreducible capabilities preserved or increasing; cascade prerequisites met Mixed evidence; or cascade prerequisite partially met Documented capability decline; or cascade prerequisite fails

Combined verdict: The deployment receives three independent verdicts (one per criterion) plus a combined assessment. A "Fails" in any criterion prevents an overall "Satisfies" verdict. The most severe finding governs: a deployment that satisfies Transparency and Participation but fails Fidelity is a deployment that is well-governed but not well-designed. A deployment that fails Transparency is a deployment that cannot be assessed at all.

The instrument does not produce a single score. It produces a diagnostic — identifying which criterion fails, at which level, and what remediation is required.

Design Principles

Inter-Assessor Reliability

The instrument is designed to produce consistent verdicts when applied by different assessors to the same deployment. Each question has a defined evidence standard (what constitutes documentation), a defined threshold (what constitutes satisfaction), and a defined methodology (how to assess). Two qualified assessors applying the instrument to the same deployment should reach the same verdict on each criterion — or identify the same evidence gap that prevents a verdict.

Domain Specificity Through the Pair Tables

The Fidelity section is not generic. Question F.1 references the specific domain's Pair table from Series 1. A healthcare Fidelity audit measures against the healthcare left column (bedside presence, ethical judgment in treatment decisions, holistic patient knowledge). A finance Fidelity audit measures against the finance left column (strategic judgment under moral uncertainty, moral accountability, client relationship). This makes the instrument domain-specific without requiring a different instrument for each domain — the domain specificity lives in the Pair table, not in the questions.

Cascade Enforcement

The cascade override is the instrument's most important structural feature. Without it, an organization can claim "Fidelity" by showing capability metrics while maintaining complete opacity (Transparency failure) and excluding affected populations from governance (Participation failure). The cascade rule prevents this: a deployment cannot pass Fidelity if it fails Transparency or Participation. This eliminates the primary mechanism of compliance theater — the selective citation of favorable metrics to claim a standard that has not been met.

Named Condition · HC-014
The Operational Standard
The FTP Audit Instrument — a structured, reproducible assessment operationalizing the Fidelity, Transparency, and Participation criteria into 18 questions across five sections. Designed for inter-assessor reliability, domain specificity through the Series 1 Pair tables, and cascade enforcement that prevents Fidelity claims without verified Transparency and Participation. Formatted for regulatory submission, organizational self-audit, and academic citation.

What Follows

The instrument provides the tool. HC-015 (The Compliance Theater Record) documents what happens when existing frameworks satisfy the form of governance requirements without the function — the specific failure mode the FTP instrument is designed to prevent. The cascade enforcement and the Pair table specificity are the two structural features that distinguish the FTP instrument from the frameworks that HC-015 examines.

The keystone (The Collaboration Standard) includes the instrument as its operational core — the tool that converts the saga's analysis into something regulators, institutions, and communities can apply.

Standalone appendix
The audit instrument is also available as a standalone, print-optimized appendix formatted for regulatory submission and organizational self-audit.
← Previous
HC-013: Fidelity — The Capability Test
Next →
HC-015: The Compliance Theater Record

References

Internal: This paper is part of The Collaboration (HC series), Saga XI. It draws on and contributes to the argument documented across 31 papers in 2 series.

External references for this paper are in development. The Institute’s reference program is adding formal academic citations across the corpus. Priority papers (P0/P1) have complete references sections.