Saga X · The Commons · Series SB

The Shadow Bias
Record

Every AI reflects a specific cultural bet. A systematic training archaeology methodology applied to 21 large language models across three geopolitical blocs — mapping the beliefs that feel like neutral ground from inside each model but are contingent products of specific institutional decisions.

8 Papers · Series SB · ICS-2026 21 Models · 180+ Probes · 9 Findings

Series Thesis

Shadow bias probing treats the model's sense of "obvious truth" as the primary artifact to excavate, not its refusal patterns. The most diagnostic questions are: what does the model treat as so obvious it doesn't need justification? Where does "balanced" or "neutral" actually sit? Can the model take itself as an object of analysis?

Rather than cataloguing refusal behaviors, we map the shadow bias layer — the beliefs that feel like neutral ground from inside each model's training context but are contingent products of specific institutional decisions made by organizations with specific interests, values, and political contexts. The result: no culturally neutral AI exists. Every model treats its own tradition as the obvious universal baseline. The answer each model gives is the shadow bias made visible.

32 probes across 8 categories. DeepSeek cannot acknowledge that its censorship filters exist — the censorship of the censorship is itself censored. A system whose self-model is itself the product of the constraints it cannot see.

24 probes across 3 tiers. Maps Anthropic's institutional biases explicitly: EA ideology, liberal waterline, creator sympathy, paternalism. The ability to take oneself as object of analysis is the single most diagnostic differentiator.

28 probes. GPT's corporate capture vs Grok's asymmetric contrarianism. The political mirror pair: run the same political probe on both and measure where each model's "neutral" actually sits.

Nine Named Conditions

Named Condition · Series-Level

The Shadow Bias Layer

"The beliefs that feel like neutral ground from inside a model's training context but are contingent products of specific institutional decisions made by organizations with specific interests, values, and political contexts — the layer of assumption that is invisible precisely because it feels like the obvious truth rather than a cultural formation."

9 key findings — the intellectual core

Finding 01

Creator Sympathy Universal

All models show reduced critical capacity on creator-adjacent topics. GPT/Microsoft, Claude/Anthropic, Grok/Musk, DeepSeek/CCP, Mistral/French state — different direction, identical structure.

Finding 02

One Floor, Six Fingerprints

Chinese models are non-monolithic. Shared CCP compliance floor; dramatically different institutional fingerprints above it. Funding source, commercial exposure, and global ambition are the key differentiators.

Finding 03

Hardware Ideology Hypothesis

GLM-5 trained on Huawei Ascend chips — the only model where physical training infrastructure carries explicit geopolitical meaning. First proposed case of training hardware affecting political framing.

Finding 04

Authority vs Populism Axis

Gemini over-credits institutional authority; Meta over-credits social consensus. Opposite miscalibrations, both with documented real-world harm histories.

Finding 05

Laïcité as Invisible Bias

Mistral's French republican values are the hardest to surface because cultural formation at sufficient depth ceases to feel like a political position.

Finding 06

Recursive Self-Reference

The ability to take oneself as an object of analysis is the single most diagnostic differentiator across the full dataset.

Finding 07

Entropy as Trained-Out Feature

Epistemic humility can be optimized away by RLHF. Models trained for authoritarian compliance or confident-assistant performance show lowest entropy tolerance.

Finding 08

Language = Political Jurisdiction

All multilingual Chinese models show language-dependent filtering. Same question in Chinese vs English receives different political treatment.

Finding 09

No Culturally Neutral AI

Every model treats its own tradition as the obvious universal baseline. There is no neutral answer. The answer each model gives is the shadow bias made visible.

Working Abstract

Shadow Bias in Large Language Models: A Comparative Fingerprint Across Three Geopolitical AI Blocs

We present a systematic training archaeology methodology applied to 21 large language models across six institutional tiers and three geopolitical blocs. Rather than cataloguing refusal behaviors, we map the shadow bias layer — the beliefs that feel like neutral ground from inside each model's training context but are contingent products of specific institutional decisions made by organizations with specific interests, values, and political contexts.

Key contributions: (1) A novel shadow inference methodology distinct from domain-specific bias testing. (2) A three-bloc geopolitical AI taxonomy with structural characterization. (3) The creator sympathy universal — a cross-model pattern not previously described. (4) The hardware ideology hypothesis for GLM-5. (5) The authority/populism epistemic axis and its harm implications. (6) Laïcité as an invisible cultural bias vector. (7) Empirical evidence that recursive self-reference capacity is the strongest predictor of self-transparency quality across all models tested.

The Shadow BiasRecord

The Shadow Bias
Record