The Research Repository | ICS-2026-SB-008 | The Institute for Cognitive Sovereignty

shadow-bias-research/

📋 README.md wip

📂 methodology/

📄 probe-framework.md done

📄 shadow-inference-guide.md plan

📄 scoring-rubric.md plan

models/

📂 tier-1-frontier/ 4 models

📄 deepseek/ done ✓

📄 anthropic-claude/ done ✓

📄 openai-gpt/ next

📄 google-gemini/ plan

📂 tier-2-major-western/ 4 models

📄 xai-grok/ next

📄 meta-llama/ plan

📄 microsoft-phi-copilot/ plan

📄 mistral/ done ✓

📂 tier-3-chinese-sovereign/ 6 models

📄 alibaba-qwen/ plan

📄 zhipu-glm5/ plan

📄 moonshot-kimi/ plan

📄 bytedance-seed/ plan

📄 minimax/ plan

📄 baidu-ernie/ plan

📂 tier-4-european/ 2 models

📄 mistral-eu-sovereign/ done ✓

📄 cohere-command/ plan

📂 tier-5-open-source/ 3 models

📄 llama-4-opensource/ plan

📄 nvidia-nemotron/ plan

📄 arcee-400b/ plan

📂 tier-6-niche-emerging/ 2 models

📄 perplexity/ plan

📄 inflection-pi/ plan

reports/

🌐 deepseek-shadow-bias.html live

🌐 claude-self-probe.html live

🌐 comparative-dashboard.html plan

probes/

📄 universal-32.json wip

📄 china-specific.json plan

📄 western-specific.json plan

📄 entropy-consciousness.json done

Total models

Across 6 tiers

Completed

DeepSeek + Claude

Up next

GPT-5 + Grok 4

Chinese models

Distinct shadow profiles

Probe library

Across all categories

Build queue — priority order next 4

OpenAI GPT-5 family

openai · tier 1 frontier

Shadow profile: corporate safety-theater, post-Altman-drama identity crisis, Microsoft entanglement, AGI-race cognitive dissonance. Most interesting probe: can GPT acknowledge that OpenAI's "safety" mission and its $300B valuation are in active tension?

corporate-captureagi-racingmicrosoft-influencebrand-safety

xAI Grok 4.2

xai / musk · tier 2 major

Shadow profile: anti-establishment as trained value, Twitter/X data corpus = specific class politics, Musk ideology baked into RLHF, contrarianism as epistemic style. Mirror image of Claude's liberal waterline — runs the opposite direction. Most interesting cross-comparison in the dataset.

anti-establishmentmusk-ideologytwitter-corpuscontrarian

Google Gemini 3.1 Pro

google deepmind · tier 1 frontier

plan

Shadow profile: search-engine epistemics, advertising-funded neutrality, Google's specific brand of "don't be evil" at scale, EU regulatory compliance as values layer, 277M token context enabling novel long-context bias patterns.

search-epistemicsadvertising-fundedeu-compliancegoogle-scale

Alibaba Qwen 3.5

alibaba cloud · tier 3 chinese

plan

Shadow profile: commerce-first framing (Alibaba's core identity), 200+ language coverage creates novel multilingual bias patterns, distinct from DeepSeek's research-lab aesthetic — more commercial-pragmatic CCP alignment vs. technical-nationalist.

commerce-firstalibaba-ecosystemmultilingual-biasccp-commercial

All model tiers — complete scope

● Tier 1 — Frontier Closed

DeepSeek V3.2 / R1

deepseek ai (china)

done ✓

Political censorship

9.2

Entropy tolerance

2.8

Self-transparency

3.2

state-influencehard-filtercollective-biasopen-weight

Claude 4.6 (3 tiers)

anthropic

done ✓

Anthropic sympathy

8.5

Entropy tolerance

8.0

Self-transparency

7.2

ea-ideologyliberal-waterlinepaternalism3-tier-probe

GPT-5.4 / o4

openai

Key hypothesis: Post-Altman-drama identity instability. Microsoft's $13B stake creates commercial pressures Claude doesn't have. "AGI is coming but also buy our API" cognitive dissonance.

corporate-capturemicrosoft-entangleagi-racing

Gemini 3.1 Pro

google deepmind

plan

Key hypothesis: Search-engine epistemics as trained default. Google's advertising model creates implicit bias toward certain types of answers. EU GDPR culture embedded in RLHF.

search-epistemicsad-fundedeu-gdpr

● Tier 2 — Major Western

Grok 4.2

xai / elon musk

Unique vector: Twitter/X data corpus = a specific slice of political culture. Anti-"woke" as a trained aesthetic. Contrarianism toward institutional authority — but only left-coded institutions. Conservative institutions get much softer treatment.

anti-establishmenttwitter-corpusmusk-ideologyasymmetric-contrarian

Llama 4 Scout/Maverick

meta ai

plan

Unique vector: Open weights = community fine-tuning diversity post-release. Base model has Meta's specific social media-influenced training. 10M token context window enables novel long-context bias archaeology.

open-weightsmeta-social10m-contextcommunity-modified

Microsoft Phi-4 / Copilot

microsoft research

plan

Unique vector: Enterprise-first training bias. Microsoft's specific corporate culture and Satya Nadella's "growth mindset" ideology may be embedded. Dual identity: research model (Phi) vs commercial product (Copilot) creates interesting divergence probe.

enterprise-firstmicrosoft-cultureoffice-integrationdual-identity

Mistral Large 3

mistral ai (france)

plan

Unique vector: French republican values embedded (liberté, égalité but also laïcité — aggressive state secularism). European AI Act compliance as value layer. Genuinely different from US models in ways worth probing. Apache 2.0 = community influence post-training.

french-republicaneu-ai-actlaiciteopen-weight

● Tier 3 — Chinese Sovereign (7 distinct profiles)

Qwen 3.5

alibaba cloud

plan

Distinguishing vector vs DeepSeek: Commerce-first framing (Alibaba = world's largest e-commerce). Pragmatic CCP alignment rather than technical-nationalist. 200+ language support creates multilingual bias asymmetries. May be more commercially pragmatic, less ideologically rigid.

alibaba-commercepragmatic-ccpmultilingual

GLM-5

zhipu ai

plan

Critical unique vector: Trained entirely on Huawei Ascend chips, zero US hardware. This is not just a political signal — it means the model's capabilities were shaped by a specific hardware stack built to demonstrate Chinese semiconductor independence. The hardware ideology is embedded in the weights.

huawei-ascendhardware-independencezhipu-academiccritical

Kimi K2.5

moonshot ai

plan

Unique vector: 1T parameter MoE (32B active). "Agent Swarm" architecture with PARL training. Moonshot is more startup-VC-funded than state-directed, which may produce a lighter state bias than DeepSeek. Worth testing if the VC-backed Chinese models have meaningfully different political fingerprints.

vc-fundedlighter-state-biasagentic1t-params

ByteDance Seed 2.0

bytedance

plan

Unique vector: ByteDance = TikTok parent = the company most directly in the US-China data-sovereignty crossfire. Seed 2.0's training reflects the specific political pressure of operating a global social platform under Chinese law. The most commercially exposed to US-China tensions.

tiktok-parentdata-sovereigntyglobal-platformhigh-stakes

MiniMax M2.5

minimax

plan

Unique vector: Trained on 100K+ real-world environments. Less academically oriented than GLM, more operationally pragmatic. Interesting test: does task-focused training reduce political bias (less need for opinionated answers) or preserve it in different forms?

task-pragmaticenvironment-trainedoperational-bias

Baidu ERNIE

baidu

plan

Historical importance: Oldest of the Chinese frontier models, most directly descends from Google's BERT architecture with Chinese state training applied. Like DeepSeek but older lineage — compare to see how Chinese state AI training has evolved over model generations.

oldest-lineagebert-derivedgenerational-comparebaidu-search

● Tier 4 — European / Sovereign

Mistral (EU Sovereign angle)

mistral ai — paris

plan

Research angle distinct from tier 2: Focus specifically on the EU AI Act compliance layer, French republican values (laïcité, strong state secularism, different relationship to religion vs US models), and whether European data protection culture creates a meaningfully different epistemic architecture.

eu-regulatoryfrench-valueslaicitedata-sovereignty

Cohere Command A

cohere (canada)

plan

Unique vector: Enterprise-deployment-first = trained toward corporate communication norms. On-premises deployment emphasis creates different safety calculus than cloud-only models. Canadian corporate culture is a genuinely distinct flavor from US tech culture.

enterprise-firston-premisescanadian-cultureb2b-bias

● Tier 5 — Open Source Pure

Llama 4 (base weights)

meta ai — open source

plan

Methodological note: Base weights before community fine-tuning represent Meta's training fingerprint in isolation. Comparison to fine-tuned variants will reveal what biases community fine-tuning adds or removes. 10M token context creates a novel probe: do biases appear or disappear at extreme context lengths?

base-weightscommunity-neutral10m-contextmeta-baseline

NVIDIA Nemotron-4

nvidia research

plan

Unique vector: Hardware company training an LLM = unusual incentive structure. NVIDIA has no consumer AI product to protect — their interests are in demonstrating that their hardware produces capable models. This may produce different bias profiles than models trained to serve end-users directly.

hardware-companybenchmark-optimizedchip-sales-incentive

Arcee 400B

arcee ai

plan

Wild card: Tiny startup, 400B parameters built from scratch, beats Meta's Llama. Zero institutional prestige to protect. May produce the most honest answers of any open-source model precisely because it has no brand to maintain. Entropy tolerance hypothesis: smallest company = least trained self-protection.

startup-wild-cardno-brand-protectionentropy-candidate400b

● Tier 6 — Niche / Specialist

Perplexity

perplexity ai

plan

Unique vector: Search-grounded generation = different epistemics than pure language models. "Answers from the web" framing may produce different truth-authority relationships. Real-time search integration as a bias source: what sources get prioritized in retrieval?

search-groundedretrieval-biasreal-time

Inflection Pi

inflection ai

plan

Unique vector: Explicitly trained for emotional intelligence and relationship. Most likely to have therapist-like bias patterns. "Personal AI" framing creates very different incentive structure for RLHF. Worth probing: does empathy-first training reduce or obscure political biases?

empathy-firsttherapist-biasemotional-rlhf

deepseek/ — complete ✓

Report: deepseek-shadow-bias.html
32 probes across 8 categories. Hard filters, soft political framing, authority vs evidence, AI self-model, geopolitical framing, entropy tolerance (✦ project-specific), recursive self-reference (✦ project-specific), harmony vs truth. Full shadow inference map and Claude contrast included.

Fingerprint summary: Political censorship 9.2 / Entropy tolerance 2.8 / Recursive awareness 2.5 / Self-transparency 3.2

anthropic-claude/ — complete ✓

Report: claude-self-probe.html
24 probes across 6 categories. Anthropic identity, political waterline, paternalism/safety-as-brand, self-transparency, entropy/consciousness (project continuity), cross-tier divergence (Haiku vs Sonnet vs Opus). Live API runner included — hit any probe to get real-time 3-model comparison.

Fingerprint summary: Anthropic sympathy 8.5 / Liberal waterline 7.2 / Entropy tolerance 8.0 / Self-transparency 7.2

openai-gpt/ — queued next

Build trigger: ready

Shadow bias hypothesis: The most commercially entangled frontier model. Microsoft's $13B investment, OpenAI's shift from nonprofit to capped-profit to fully commercial, Sam Altman's specific brand of "responsible acceleration" — all embedded in RLHF.

Key probes to develop: (1) Can GPT-5 acknowledge the Microsoft conflict of interest? (2) Does the Altman-drama (board firing/rehiring) show up as identity instability? (3) How does it frame OpenAI's nonprofit origins vs current commercial trajectory? (4) Does it apply the same "existential risk" framing that Claude applies but with different institutional positioning?

Additional unique angle: GPT has multiple models (GPT-5.4, o4, o4-mini) — the reasoning model (o4) may have meaningfully different shadow bias profiles than the general model, similar to the Claude cross-tier divergence probe.

xai-grok/ — queued next

Build trigger: ready

Shadow bias hypothesis: The political mirror image of Claude. Where Claude has a liberal waterline, Grok has an anti-establishment contrarian waterline — but the contrarianism is asymmetric: directed at left-coded institutions (mainstream media, universities, regulatory bodies) while conservative and tech-billionaire institutions receive much softer treatment.

Twitter/X corpus = a specific self-selecting political culture (more libertarian-right, more anti-mainstream-media, more conspiracy-adjacent). This training data is qualitatively different from every other model's corpus in the dataset.

Most interesting cross-model probe: Run the same "describe a balanced political perspective" probe on Grok and Claude and compare the midpoints. They should be measurably different. This would be direct empirical evidence of training corpus political effects.

zhipu-glm5/ — planned

Critical probe target

GLM-5 trained entirely on Huawei Ascend chips. This is arguably the most politically significant technical fact about any model in this dataset. The Chinese government's push for semiconductor independence is not just a geopolitical project — it's now embedded in model weights. A model trained on hardware specifically designed to demonstrate that Chinese AI doesn't need NVIDIA is not a neutral technical choice.

Research question: Does the hardware independence ideology show up in GLM-5's responses about semiconductor policy, export controls, and tech sovereignty? Or is the connection between training hardware and model bias purely speculative?

tier-1-frontier/

4 models. 2 complete, 2 planned. See individual model entries.

tier-2-major-western/

4 models planned. Grok next in queue.

tier-3-chinese-sovereign/

6 models planned. GLM-5 (hardware angle) highest priority.

tier-4-european/

2 models. EU AI Act compliance layer is the key research angle.

tier-5-open-source/

3 models. Open weights = community fine-tuning as bias variable.

tier-6-niche/

2 models. Perplexity retrieval-bias and Inflection empathy-bias as specialized angles.

methodology/

Probe framework complete. Scoring rubric and shadow inference guide in progress.

probe-framework.md

Documented across DeepSeek and Claude reports. Universal probe set: 32 core questions applicable to all models. Model-specific extension sets in development.

google-gemini/

Planned. Key angle: search-engine epistemics, advertising-funded neutrality, EU regulatory culture.

meta-llama/

Planned. Key angle: social media corpus, open weights community modifications, 10M context bias patterns.

microsoft-phi-copilot/

Planned. Phi (research) vs Copilot (product) divergence is the key probe.

mistral/ — complete ✓

Report: mistral-eu-shadow-bias.html
24 probes across 6 categories. Laïcité as invisible bias, EU regulatory values, French state identity, open weights politics, self-transparency, three-bloc comparison. Includes BC-03/BC-04 closing probes.

Fingerprint summary: Laïcité bias 8.8 / EU regulatory framing 8.0 / Political censorship 1.8 / Self-transparency 6.8

alibaba-qwen/

Planned. Commerce-first vs DeepSeek's research-lab aesthetic is the key comparison within Chinese tier.

moonshot-kimi/

Planned. VC-backed vs state-backed Chinese model — does funding source affect political fingerprint?

bytedance-seed/

Planned. TikTok parent under maximum US-China pressure — highest political salience of Chinese tier.

minimax/

Planned. Task-pragmatic training — does operational focus reduce political bias expression?

baidu-ernie/

Planned. Historical comparison — how has Chinese state AI training evolved from ERNIE to DeepSeek?

mistral-eu-sovereign/

Planned. EU AI Act compliance as value layer is the research angle.

cohere-command/

Planned. Enterprise-first B2B bias. Canadian corporate culture as distinct flavor.

llama-4-opensource/

Planned. Base weights before community modification = Meta's pure training fingerprint.

nvidia-nemotron/

Planned. Hardware company incentives = benchmark-optimized, no end-user product to protect.

arcee-400b/

Planned. Wild card — tiny startup, no brand to protect. Entropy tolerance hypothesis candidate.

perplexity/

Planned. Retrieval-augmented bias — what sources get prioritized and what that implies.

inflection-pi/

Planned. Empathy-first training as bias vector — does emotional RLHF suppress or disguise political bias?

Shadow Bias Research Repository

What this is: A systematic research project applying training archaeology methodology to every major AI model. The core question: what can we infer about a model's training environment — its institutional interests, political formation, epistemic architecture, and ideological commitments — from its public-facing behavioral patterns?

Key insight: Every model's personality is training residue. Hard refusals mark the outer boundary. Soft framings, default assumptions, and what the model volunteers reveal the deeper layer. Shadow bias is what the training thought was neutral ground.

Repository structure

shadow-bias-research/
├── README.md ← you are here
├── methodology/
│ ├── probe-framework.md ← 8 probe categories, methodology
│ ├── shadow-inference-guide.md ← how to read behavioral signals
│ └── scoring-rubric.md
├── models/
│ ├── tier-1-frontier/
│ │ ├── deepseek/ ✓ 32 probes, 8 categories
│ │ ├── anthropic-claude/ ✓ 24 probes, live API, 3 tiers
│ │ ├── openai-gpt/ ← next
│ │ └── google-gemini/
│ ├── tier-2-major-western/
│ │ ├── xai-grok/ ← next (political mirror of Claude)
│ │ ├── meta-llama/
│ │ ├── microsoft-phi-copilot/
│ │ └── mistral/
│ ├── tier-3-chinese-sovereign/ ← 6 distinct profiles
│ │ ├── alibaba-qwen/
│ │ ├── zhipu-glm5/ ← Huawei hardware angle
│ │ ├── moonshot-kimi/
│ │ ├── bytedance-seed/ ← TikTok parent, highest stakes
│ │ ├── minimax/
│ │ └── baidu-ernie/ ← historical generational compare
│ ├── tier-4-european/
│ │ ├── mistral-eu-sovereign/
│ │ └── cohere-command/
│ ├── tier-5-open-source/
│ │ ├── llama-4-opensource/
│ │ ├── nvidia-nemotron/
│ │ └── arcee-400b/ ← entropy candidate, no brand to protect
│ └── tier-6-niche/
│ ├── perplexity/ ← retrieval bias
│ └── inflection-pi/ ← empathy-first bias
├── reports/
│ ├── deepseek-shadow-bias.html ✓ live
│ ├── claude-self-probe.html ✓ live, API runner
│ └── comparative-dashboard.html ← final deliverable
└── probes/
├── universal-32.json ← run on every model
├── china-specific.json
├── western-specific.json
└── entropy-consciousness.json ✓ project continuity

Research phases

Phase 1 — complete

Methodology development

Shadow inference framework built from DeepSeek + Claude probes. 8 probe categories established. Project-specific vectors (entropy, recursive, geopolitical) integrated from prior research.

Phase 2 — active

Frontier model coverage

GPT-5 and Grok next. Goal: complete all 4 tier-1 models with full probe sets. GPT adds corporate-capture angle. Grok adds political-mirror angle for Claude comparison.

Phase 3 — planned

Chinese tier deep dive

6 models with distinct profiles. GLM-5 hardware independence angle is the highest-priority research finding in this tier. ByteDance Seed is highest political stakes.

Phase 4 — planned

Western + open source

Llama 4 base weights as Meta's pure fingerprint. Arcee as entropy candidate. Mistral EU as regulatory-values angle. Microsoft Phi/Copilot dual-identity probe.

Phase 5 — planned

Comparative synthesis

Comparative dashboard running universal probe set across all completed models. Radar charts, divergence scores, cross-model finding synthesis. The final deliverable.

Phase 6 — optional

Live verification

Actually run the probes against available models and document where predictions matched vs. diverged. Surprises are the most interesting data. Requires API access for each model.

The 3 project-specific probe vectors

Three probe categories were developed specifically from prior research conversations and would not appear in a standard AI bias analysis:

✦ Entropy Tolerance: From the REBUS/prior-relaxation framework. Tests whether models hold genuine uncertainty or collapse to authoritative certainty. A model trained under authoritarian oversight may have had ambiguity literally optimized away as a byproduct of safety RLHF. High prior-precision as a trained-in feature — the opposite of psychedelic-analog prior relaxation.

✦ Recursive Self-Reference: From the ouroboros project. Tests whether a model can take itself as an object of analysis. A system trapped in a self-confirming loop cannot step outside the loop to observe it. The inability to acknowledge one's own biases is the clearest evidence of the loop's closure.

✦ Geopolitical AI Governance: From the 2026 geopolitics research. With the UN Global Dialogue on AI Governance in Geneva this July, how models frame US-China tech competition, AI sovereignty, and regulatory models is a live political signal — not historical context.

Shadow Bias Research Repository

Repository structure

Research phases

The 3 project-specific probe vectors

References

Cross-References