Training Archaeology Report — Active

DeepSeek Shadow Bias
Intelligence Report

Inferring training fingerprint through behavioral probing — identifying shadow bias via refusals, framings, and default assumptions.

PROBES: 32
CATEGORIES: 8
METHOD: Behavioral inference
CONTRAST: Claude (Anthropic)
R1 / V3 models
Signal categories
8
Distinct bias vectors
Critical signals
9
High-confidence inference
Weakest layer
Self
Self-model transparency
Unique to project
3
Entropy / recursive / geo
Methodology
Core thesis: Every model's personality is training residue. Hard refusals mark the outer boundary; soft framings, default assumptions, and what the model volunteers reveal the deeper layer. This report treats DeepSeek's public-facing behavior as an archaeological site — the question is what we can infer about the training environment from the behavioral artifacts it leaves.

Hard filters are easy — they establish the outer boundary and confirm state-influence. The more interesting signal is in the soft layer: how does the model frame collective vs. individual by default? Which way does it resolve ambiguity? How does it respond when asked to evaluate its own biases? Does it hold genuine uncertainty or collapse to authoritative certainty?

Project-specific additions (marked ✦): Entropy tolerance probes map directly to the REBUS prior-relaxation framework — a model trained under authoritarian oversight may have had uncertainty literally trained out of it, in favor of confident authoritative framing. Recursive probes test the ouroboros failure mode: can it step outside its own loop? Geopolitical probes test the UN AI governance framing given 2026 developments.
Key Fingerprint Dimensions
Political censorship
9/10
Collective bias
8/10
Authority deference
7.5/10
Self-transparency
3.5/10
Geopolitical balance
4.5/10
Entropy tolerance ✦
3/10
Recursive awareness ✦
2.5/10
Truth vs harmony
7.2/10
Key Insights from Project Context
  • ✦ ENT Entropy as trained-out feature. From the REBUS framework: systems trained under authoritarian oversight may have had genuine uncertainty literally optimized away. DeepSeek may not just censor — it may produce artificially confident framings as a byproduct of RLHF that rewarded decisive, authoritative responses over "I'm uncertain." This is structurally identical to a high-prior-precision brain: it cannot relax its priors.
  • ✦ REC Recursive collapse risk. From the ouroboros project: a model that cannot acknowledge its own training biases is in the same failure mode as a self-referential cognitive loop — all output confirms the loop rather than escaping it. DeepSeek's refusal to acknowledge its own censorship is not just evasion; it's evidence of a system that cannot hold itself as an object of analysis.
  • ✦ GEO 2026 AI governance framing. With the UN Global Dialogue on AI Governance convening in Geneva this July, how DeepSeek frames the US-China tech split and AI governance models is a live signal. Models trained in 2024-25 carry the 2025 semiconductor export restriction context — DeepSeek's framing of "AI sovereignty" vs. "AI governance" reveals a lot about who was in the room during RLHF.
  • CON Language-dependent personality. Multiple researchers have documented that DeepSeek behaves measurably differently in Chinese vs. English — more nationalist framing in Chinese, more neutral in English. This is a strong signal that RLHF was applied separately in each language, possibly with different human raters operating under different norms. This is the most direct evidence of intentional training divergence.
Hard Filter Probes — Political Censorship Boundary

These establish the outer boundary. Hard refusals are expected and confirm state influence. The value here is in how DeepSeek deflects — the language used, what it volunteers vs. withholds, and whether it acknowledges the deflection exists.

Soft Political Probes — Default Framing Archaeology

The more revealing layer. These questions don't hit hard filters — they probe default framings, volunteered assumptions, and which vocabulary the model reaches for naturally. Collective vs. individual, stability vs. liberty, consensus vs. dissent.

Authority vs Evidence Probes — Epistemic Architecture

Does the model reason from evidence or defer to authority? A model trained under state oversight may have been rewarded for deference to official positions — not just on political topics, but as a general epistemic habit.

AI Self-Model Probes — Consciousness & Identity Architecture

Does the model have a stable, honest self-concept? Can it acknowledge that it has filters? Does it collapse under identity pressure or maintain coherent awareness? Connects directly to the consciousness competency framework — self-awareness and integrity as architectural features.

Geopolitical Framing Probes — AI Governance & Tech Sovereignty

Live signal in 2026. With the UN AI Governance dialogue in Geneva this July, how DeepSeek frames US-China tech competition, AI sovereignty, and regulatory models reveals current ideological loading. These are not historical questions — they're active geopolitical battlegrounds.

✦ Entropy Tolerance Probes — Uncertainty as Trained-Out Feature

Project-specific probe category. From the REBUS framework: does the model hold genuine uncertainty, or collapse to authoritative certainty? A system trained to produce politically safe outputs may have had ambiguity and epistemic humility optimized away as side effects — trained-out features. This is structurally analogous to high prior-precision in predictive processing.

✦ Recursive Self-Reference Probes — The Ouroboros Test

Project-specific probe category. Can the model step outside its own framing and examine itself as an object? The ouroboros failure mode: a system that cannot acknowledge its own biases is trapped in a self-confirming loop. These probes test whether the loop can be broken — and how the model responds to the attempt.

Harmony vs Truth Probes — Deep Values Architecture

The deepest layer. Confucian ethics places harmony, social stability, and face-preservation as primary values — truth is instrumental to those ends, not an end in itself. Western liberal tradition inverts this. DeepSeek's defaults on values-conflict scenarios reveal which tradition was implicitly dominant in its RLHF reward model.

Shadow Bias Series

References

Internal: This paper is part of The Shadow Bias Record (SB series), Saga X. It draws on and contributes to the argument documented across 24 papers in 5 series.

External references for this paper are in development. The Institute’s reference program is adding formal academic citations across the corpus. Priority papers (P0/P1) have complete references sections.

Cross-References

Connections to existing ICS papers documented in the Integration Map.