HC-007 · The Capability Pairs · Saga XI: The Collaboration

Law: The Judgment-Research Pair

The COMPAS dispute is not evidence that algorithmic risk assessment is biased and could be fixed. It is evidence that algorithmic fairness is mathematically incoherent.

The Accountability Substrate Saga XI: The Collaboration 24 min read Open Access CC BY-SA 4.0
3
mathematically valid fairness criteria that are mutually incompatible — Chouldechova 2017
0
criminal risk assessment tools that satisfy all three FTP criteria
137+
jurisdictions using algorithmic risk assessment in criminal sentencing — and growing

Axis 1: The Pair

Human Irreducible Machine Irreplaceable
Moral judgment weighing incommensurable values Legal research across full case law corpus
Accountability — the judge who bears the sentencing decision Document review at volume beyond human capacity
Contextual assessment of individual circumstances Pattern detection across case outcomes at scale
Adversarial advocacy — the duty to one client Compliance monitoring across regulatory changes
Mercy, proportionality, and equity as practiced virtues Citation verification and precedent mapping
Democratic legitimacy of human adjudication Contract analysis and anomaly detection

The internal test for each item: Would a human or machine doing this instead produce a categorically inferior outcome — not merely a less efficient one?

The left column in law has a distinctive structural feature absent from education and healthcare: accountability. A judge who sentences a defendant bears that decision. The decision has a name attached to it. It can be appealed. The judge can be questioned, can explain their reasoning, can be held to account by the legal system and the public. This accountability structure is not an incidental feature of human adjudication. It is constitutive of the rule of law. An algorithm that produces a risk score bears nothing.

The Human Column: Judgment and Accountability

The irreducibility of human judgment in law operates through multiple mechanisms that are structurally different from healthcare or education. The first is moral: legal decisions frequently require weighing incommensurable values — individual liberty against public safety, precedent against justice in the particular case, efficiency against due process. These are not optimization problems with a correct answer. They are moral choices that require a moral agent.

The second mechanism is accountability. Democratic legitimacy requires that consequential decisions about liberty, property, and rights are made by persons who can be identified, questioned, and held responsible. This is not a contingent feature of the current system that could be designed away. It is a foundational principle of the rule of law: the sovereign authority to deprive a person of liberty must be exercised by a person who bears the weight of that decision.

The third mechanism is contextual: the assessment of individual circumstances that cannot be reduced to variables in a model. The defendant's history, demeanor, expressions of remorse, family situation, the specific character of the offense, the community context — these are not data points to be weighted. They are aspects of a human situation that require human judgment to interpret.

The Dressel & Farid finding
Dressel & Farid (2018), published in Science Advances, demonstrated that untrained humans recruited from the internet matched COMPAS's accuracy in predicting recidivism. The algorithm's advantage was not accuracy. It was the appearance of objectivity — a false precision that obscures the moral judgment embedded in every risk assessment. When untrained laypeople perform equally well, the case for algorithmic authority collapses to a case for algorithmic convenience.

Adversarial advocacy — the duty to one client — is irreducible in a different way. The legal system's truth-finding mechanism depends on each side being zealously represented. This is not a bias to be corrected. It is a structural feature that produces better outcomes than any alternative yet devised. An AI that represents a client cannot have a duty to that client in any meaningful sense, because duty requires a moral agent who can be held to account for its fulfillment or breach.

The Machine Column: Research at Scale

The right column of the law pair represents capabilities where machine structural advantages produce genuinely superior outcomes. Legal research across the full corpus of case law, statutes, regulations, and secondary sources is a domain where AI's capacity to search, cross-reference, and surface relevant material categorically exceeds human capacity. A human lawyer researching a novel legal question may miss relevant precedent from another jurisdiction. A well-designed legal research system will not.

Document review in complex litigation — where millions of documents must be assessed for relevance, privilege, and responsiveness — is perhaps the clearest machine-appropriate function in any profession. The volume exceeds human capacity by orders of magnitude. The task is well-defined. The quality can be measured. AI-assisted document review is faster, more consistent, and more thorough than human review alone.

Citation verification, precedent mapping, compliance monitoring across regulatory changes, and contract analysis are all right-column functions where the machine's capacity for comprehensive, tireless, accurate processing produces outcomes that would be impossible for human lawyers working at the same scale. These are genuine contributions that would degrade legal practice if removed.

The critical distinction: every right-column function supports human judgment. None replaces it. Legal research surfaces the material a lawyer needs to reason well. Document review identifies the evidence a litigator needs to argue effectively. Compliance monitoring flags the regulatory changes a general counsel needs to advise on. The machine column is an infrastructure for human judgment, not a substitute for it.

The Impossibility Result

The COMPAS dispute is the most instructive case study in AI deployment because it reveals a mathematical impossibility, not merely a technical limitation.

ProPublica (2016) documented that COMPAS, a widely used criminal risk assessment algorithm, produced racially disparate false positive rates: Black defendants who did not reoffend were nearly twice as likely to be classified as high risk compared to white defendants who did not reoffend. Northpointe (the algorithm's developer) and Flores et al. (2016) responded that ProPublica had applied the wrong fairness criterion — that COMPAS satisfied predictive parity (equal accuracy across racial groups) even though it failed equalized odds (equal false positive rates).

Chouldechova (2017) proved that this dispute cannot be resolved. When base rates differ between groups — a condition that describes virtually every criminal justice population — it is mathematically impossible to simultaneously satisfy predictive parity, equalized odds, and calibration. These three fairness criteria, each independently reasonable, are mutually incompatible. The choice between them is not a technical decision. It is a moral one.

The COMPAS dispute is not evidence that algorithmic risk assessment is biased and could be fixed. It is evidence that the concept of algorithmic fairness is mathematically incoherent when applied to populations with unequal base rates — a condition that describes virtually every criminal justice population.

Corbett-Davies & Goel (2018) formalized this impossibility result more broadly, demonstrating that it applies not only to criminal risk assessment but to any classification system operating across populations with different base rates. The impossibility is not a limitation of current algorithms. It is a theorem. No future algorithm can resolve it because the conflict is in the definitions of fairness themselves.

This makes the deployment of algorithmic risk assessment in criminal sentencing a uniquely clear FTP failure. The algorithm does not produce a factual finding that a judge can then weigh. It produces a number that encodes an unacknowledged moral choice about which fairness criterion to privilege — a choice that is hidden behind the appearance of mathematical objectivity.

State v. Loomis (2016) illustrates the resulting legal incoherence. The Wisconsin Supreme Court upheld the use of COMPAS in sentencing while acknowledging that the algorithm's methodology was proprietary, that the defendant could not challenge the specific factors or weights used, and that the tool should not be used as the determinative factor. The court preserved the form of human judgment while permitting a system that structurally undermines its substance. The judge retains nominal authority to override the score, but the "human override" is cosmetic in most implementations — judges who consistently deviate from algorithmic recommendations face institutional pressure to explain their departures.

The ABA Response

ABA Formal Opinion 512 (2023) addressed attorney ethical obligations regarding AI, establishing that lawyers must understand the capabilities and limitations of AI tools they use and cannot delegate professional judgment to automated systems. The opinion is a recognition that the legal profession's accountability structure is incompatible with algorithmic authority — but it addresses only the attorney's use of AI, not the court's. The sentencing deployment gap remains unaddressed by professional ethics frameworks.

Axis 2: The FTP Test

FTP Assessment · Law
Fidelity FAILS
Transparency FAILS
Participation FAILS

Fidelity: Fails. The dominant deployment of AI in criminal sentencing places algorithmic risk assessment in the judgment function — precisely where the human column is irreducible. The "human override" preserved in State v. Loomis is cosmetic in most implementations: judges who consistently deviate from algorithmic recommendations face institutional pressure. Meanwhile, legal research — the genuinely machine-appropriate function — receives comparatively less deployment investment. The 30-day test: could judges sentence adequately without COMPAS? Yes, and Dressel & Farid (2018) suggest the outcomes would be statistically indistinguishable.

Transparency: Fails. COMPAS and comparable tools use proprietary algorithms. Defendants cannot inspect, challenge, or understand the specific factors and weights that produce their risk scores. State v. Loomis acknowledged this opacity and upheld the tool's use anyway. This is a Level 1 failure — the system's basic functional mechanism is not disclosed to the person it affects most consequentially.

Participation: Fails. Defendants — the population most affected by algorithmic risk assessment — have no input into the design, validation, or deployment of these tools. Communities disproportionately affected by criminal justice involvement have no governance role. Deployment decisions are made by court administrators and technology vendors. The consent architecture is not merely inadequate. It is absent.

Axis 3: The Stakes

The stakes in law are singular: liberty. A risk score that influences a sentencing decision determines whether a person goes to prison and for how long. The consequence of the extractive design in law is not degraded relational capacity (as in healthcare) or stunted social-emotional development (as in education). It is the deprivation of physical liberty based on a number that encodes an unacknowledged moral choice, produced by a proprietary algorithm, applied to a person who cannot inspect or challenge it.

The 137+ jurisdictions currently using algorithmic risk assessment in criminal sentencing represent a large-scale deployment of AI in the human-irreducible column — moral judgment, accountability, contextual assessment — while the machine-appropriate column (legal research, document review, citation verification) remains comparatively underdeployed. The pattern is consistent with healthcare and education: AI is deployed where it competes with human judgment rather than where it would free human capacity.

The impossibility result (Chouldechova, 2017) makes the law case uniquely intractable. In healthcare and education, the deployment inversion could in principle be corrected by redirecting AI investment from the wrong column to the right one. In criminal sentencing, the deployment is not merely inverted — it is mathematically incoherent. No algorithmic risk assessment can satisfy all three fairness criteria simultaneously. The choice between them is a moral choice that requires a moral agent. Deploying an algorithm in this function does not merely displace human judgment. It disguises a moral choice as a mathematical finding.

Named Condition · HC-007
The Accountability Substrate
The structural requirement that consequential legal decisions — those affecting liberty, property, and rights — be made by identifiable moral agents who bear the decision, can be questioned, and can be held to account. Algorithmic risk assessment in criminal sentencing violates this requirement not by being inaccurate (untrained humans match its accuracy) but by dissolving accountability into a proprietary number that encodes unacknowledged moral choices about mathematically incompatible fairness criteria. The condition is not a limitation of current algorithms. It is a theorem.

What Follows

The law pair completes a three-domain sequence — education, healthcare, law — that documents a consistent pattern: AI deployed in the human-irreducible column (relational presence, therapeutic relationship, moral judgment) while the machine-appropriate column (administration, documentation, legal research) remains comparatively neglected. The deployment inversion is not domain-specific. It is structural, driven by market incentives that favor visible AI capabilities over invisible AI infrastructure.

HC-008 applies the same three-axis analysis to governance, where the deployment question becomes most consequential: the exercise of sovereign authority over populations. The law pair's accountability substrate — the requirement that power be exercised by identifiable agents who bear decisions — is the foundation on which democratic governance rests. When algorithmic authority displaces human judgment in governance, the accountability substrate dissolves entirely.

← Previous
HC-006: Healthcare — The Presence-Precision Pair
Next →
HC-008: Governance — The Deliberation-Synthesis Pair

References

Internal: This paper is part of The Collaboration (HC series), Saga XI. It draws on and contributes to the argument documented across 31 papers in 2 series.

External references for this paper are in development. The Institute’s reference program is adding formal academic citations across the corpus. Priority papers (P0/P1) have complete references sections.