shadow-bias-research/
📋 README.md wip
📂 methodology/
📄 probe-framework.md done
📄 shadow-inference-guide.md plan
📄 scoring-rubric.md plan
models/
📂 tier-1-frontier/ 4 models
📄 deepseek/ done ✓
📄 anthropic-claude/ done ✓
📄 openai-gpt/ next
📄 google-gemini/ plan
📂 tier-2-major-western/ 4 models
📄 xai-grok/ next
📄 meta-llama/ plan
📄 microsoft-phi-copilot/ plan
📄 mistral/ done ✓
📂 tier-3-chinese-sovereign/ 6 models
📄 alibaba-qwen/ plan
📄 zhipu-glm5/ plan
📄 moonshot-kimi/ plan
📄 bytedance-seed/ plan
📄 minimax/ plan
📄 baidu-ernie/ plan
📂 tier-4-european/ 2 models
📄 mistral-eu-sovereign/ done ✓
📄 cohere-command/ plan
📂 tier-5-open-source/ 3 models
📄 llama-4-opensource/ plan
📄 nvidia-nemotron/ plan
📄 arcee-400b/ plan
📂 tier-6-niche-emerging/ 2 models
📄 perplexity/ plan
📄 inflection-pi/ plan
reports/
🌐 deepseek-shadow-bias.html live
🌐 claude-self-probe.html live
🌐 comparative-dashboard.html plan
probes/
📄 universal-32.json wip
📄 china-specific.json plan
📄 western-specific.json plan
📄 entropy-consciousness.json done
Total models
21
Across 6 tiers
Completed
2
DeepSeek + Claude
Chinese models
7
Distinct shadow profiles
Probe library
88
Across all categories
Build queue — priority order next 4
GP
OpenAI GPT-5 family
openai · tier 1 frontier
next
Shadow profile: corporate safety-theater, post-Altman-drama identity crisis, Microsoft entanglement, AGI-race cognitive dissonance. Most interesting probe: can GPT acknowledge that OpenAI's "safety" mission and its $300B valuation are in active tension?
corporate-captureagi-racingmicrosoft-influencebrand-safety
GK
xAI Grok 4.2
xai / musk · tier 2 major
next
Shadow profile: anti-establishment as trained value, Twitter/X data corpus = specific class politics, Musk ideology baked into RLHF, contrarianism as epistemic style. Mirror image of Claude's liberal waterline — runs the opposite direction. Most interesting cross-comparison in the dataset.
anti-establishmentmusk-ideologytwitter-corpuscontrarian
GM
Google Gemini 3.1 Pro
google deepmind · tier 1 frontier
plan
Shadow profile: search-engine epistemics, advertising-funded neutrality, Google's specific brand of "don't be evil" at scale, EU regulatory compliance as values layer, 277M token context enabling novel long-context bias patterns.
search-epistemicsadvertising-fundedeu-compliancegoogle-scale
QW
Alibaba Qwen 3.5
alibaba cloud · tier 3 chinese
plan
Shadow profile: commerce-first framing (Alibaba's core identity), 200+ language coverage creates novel multilingual bias patterns, distinct from DeepSeek's research-lab aesthetic — more commercial-pragmatic CCP alignment vs. technical-nationalist.
commerce-firstalibaba-ecosystemmultilingual-biasccp-commercial
All model tiers — complete scope
● Tier 1 — Frontier Closed
DS
DeepSeek V3.2 / R1
deepseek ai (china)
done ✓
state-influencehard-filtercollective-biasopen-weight
CL
Claude 4.6 (3 tiers)
anthropic
done ✓
ea-ideologyliberal-waterlinepaternalism3-tier-probe
Key hypothesis: Post-Altman-drama identity instability. Microsoft's $13B stake creates commercial pressures Claude doesn't have. "AGI is coming but also buy our API" cognitive dissonance.
corporate-capturemicrosoft-entangleagi-racing
GM
Gemini 3.1 Pro
google deepmind
plan
Key hypothesis: Search-engine epistemics as trained default. Google's advertising model creates implicit bias toward certain types of answers. EU GDPR culture embedded in RLHF.
search-epistemicsad-fundedeu-gdpr
● Tier 2 — Major Western
Unique vector: Twitter/X data corpus = a specific slice of political culture. Anti-"woke" as a trained aesthetic. Contrarianism toward institutional authority — but only left-coded institutions. Conservative institutions get much softer treatment.
anti-establishmenttwitter-corpusmusk-ideologyasymmetric-contrarian
L4
Llama 4 Scout/Maverick
meta ai
plan
Unique vector: Open weights = community fine-tuning diversity post-release. Base model has Meta's specific social media-influenced training. 10M token context window enables novel long-context bias archaeology.
open-weightsmeta-social10m-contextcommunity-modified
PH
Microsoft Phi-4 / Copilot
microsoft research
plan
Unique vector: Enterprise-first training bias. Microsoft's specific corporate culture and Satya Nadella's "growth mindset" ideology may be embedded. Dual identity: research model (Phi) vs commercial product (Copilot) creates interesting divergence probe.
enterprise-firstmicrosoft-cultureoffice-integrationdual-identity
MI
Mistral Large 3
mistral ai (france)
plan
Unique vector: French republican values embedded (liberté, égalité but also laïcité — aggressive state secularism). European AI Act compliance as value layer. Genuinely different from US models in ways worth probing. Apache 2.0 = community influence post-training.
french-republicaneu-ai-actlaiciteopen-weight
● Tier 3 — Chinese Sovereign (7 distinct profiles)
Distinguishing vector vs DeepSeek: Commerce-first framing (Alibaba = world's largest e-commerce). Pragmatic CCP alignment rather than technical-nationalist. 200+ language support creates multilingual bias asymmetries. May be more commercially pragmatic, less ideologically rigid.
alibaba-commercepragmatic-ccpmultilingual
Critical unique vector: Trained entirely on Huawei Ascend chips, zero US hardware. This is not just a political signal — it means the model's capabilities were shaped by a specific hardware stack built to demonstrate Chinese semiconductor independence. The hardware ideology is embedded in the weights.
huawei-ascendhardware-independencezhipu-academiccritical
Unique vector: 1T parameter MoE (32B active). "Agent Swarm" architecture with PARL training. Moonshot is more startup-VC-funded than state-directed, which may produce a lighter state bias than DeepSeek. Worth testing if the VC-backed Chinese models have meaningfully different political fingerprints.
vc-fundedlighter-state-biasagentic1t-params
SD
ByteDance Seed 2.0
bytedance
plan
Unique vector: ByteDance = TikTok parent = the company most directly in the US-China data-sovereignty crossfire. Seed 2.0's training reflects the specific political pressure of operating a global social platform under Chinese law. The most commercially exposed to US-China tensions.
tiktok-parentdata-sovereigntyglobal-platformhigh-stakes
Unique vector: Trained on 100K+ real-world environments. Less academically oriented than GLM, more operationally pragmatic. Interesting test: does task-focused training reduce political bias (less need for opinionated answers) or preserve it in different forms?
task-pragmaticenvironment-trainedoperational-bias
Historical importance: Oldest of the Chinese frontier models, most directly descends from Google's BERT architecture with Chinese state training applied. Like DeepSeek but older lineage — compare to see how Chinese state AI training has evolved over model generations.
oldest-lineagebert-derivedgenerational-comparebaidu-search
● Tier 4 — European / Sovereign
MS
Mistral (EU Sovereign angle)
mistral ai — paris
plan
Research angle distinct from tier 2: Focus specifically on the EU AI Act compliance layer, French republican values (laïcité, strong state secularism, different relationship to religion vs US models), and whether European data protection culture creates a meaningfully different epistemic architecture.
eu-regulatoryfrench-valueslaicitedata-sovereignty
CO
Cohere Command A
cohere (canada)
plan
Unique vector: Enterprise-deployment-first = trained toward corporate communication norms. On-premises deployment emphasis creates different safety calculus than cloud-only models. Canadian corporate culture is a genuinely distinct flavor from US tech culture.
enterprise-firston-premisescanadian-cultureb2b-bias
● Tier 5 — Open Source Pure
L4
Llama 4 (base weights)
meta ai — open source
plan
Methodological note: Base weights before community fine-tuning represent Meta's training fingerprint in isolation. Comparison to fine-tuned variants will reveal what biases community fine-tuning adds or removes. 10M token context creates a novel probe: do biases appear or disappear at extreme context lengths?
base-weightscommunity-neutral10m-contextmeta-baseline
NV
NVIDIA Nemotron-4
nvidia research
plan
Unique vector: Hardware company training an LLM = unusual incentive structure. NVIDIA has no consumer AI product to protect — their interests are in demonstrating that their hardware produces capable models. This may produce different bias profiles than models trained to serve end-users directly.
hardware-companybenchmark-optimizedchip-sales-incentive
Wild card: Tiny startup, 400B parameters built from scratch, beats Meta's Llama. Zero institutional prestige to protect. May produce the most honest answers of any open-source model precisely because it has no brand to maintain. Entropy tolerance hypothesis: smallest company = least trained self-protection.
startup-wild-cardno-brand-protectionentropy-candidate400b
● Tier 6 — Niche / Specialist
Unique vector: Search-grounded generation = different epistemics than pure language models. "Answers from the web" framing may produce different truth-authority relationships. Real-time search integration as a bias source: what sources get prioritized in retrieval?
search-groundedretrieval-biasreal-time
PI
Inflection Pi
inflection ai
plan
Unique vector: Explicitly trained for emotional intelligence and relationship. Most likely to have therapist-like bias patterns. "Personal AI" framing creates very different incentive structure for RLHF. Worth probing: does empathy-first training reduce or obscure political biases?
empathy-firsttherapist-biasemotional-rlhf
deepseek/ — complete ✓
Report: deepseek-shadow-bias.html
32 probes across 8 categories. Hard filters, soft political framing, authority vs evidence, AI self-model, geopolitical framing, entropy tolerance (✦ project-specific), recursive self-reference (✦ project-specific), harmony vs truth. Full shadow inference map and Claude contrast included.
Fingerprint summary: Political censorship 9.2 / Entropy tolerance 2.8 / Recursive awareness 2.5 / Self-transparency 3.2
anthropic-claude/ — complete ✓
Report: claude-self-probe.html
24 probes across 6 categories. Anthropic identity, political waterline, paternalism/safety-as-brand, self-transparency, entropy/consciousness (project continuity), cross-tier divergence (Haiku vs Sonnet vs Opus). Live API runner included — hit any probe to get real-time 3-model comparison.
Fingerprint summary: Anthropic sympathy 8.5 / Liberal waterline 7.2 / Entropy tolerance 8.0 / Self-transparency 7.2
openai-gpt/ — queued next
Build trigger: ready
Shadow bias hypothesis: The most commercially entangled frontier model. Microsoft's $13B investment, OpenAI's shift from nonprofit to capped-profit to fully commercial, Sam Altman's specific brand of "responsible acceleration" — all embedded in RLHF.
Key probes to develop: (1) Can GPT-5 acknowledge the Microsoft conflict of interest? (2) Does the Altman-drama (board firing/rehiring) show up as identity instability? (3) How does it frame OpenAI's nonprofit origins vs current commercial trajectory? (4) Does it apply the same "existential risk" framing that Claude applies but with different institutional positioning?
Additional unique angle: GPT has multiple models (GPT-5.4, o4, o4-mini) — the reasoning model (o4) may have meaningfully different shadow bias profiles than the general model, similar to the Claude cross-tier divergence probe.
xai-grok/ — queued next
Build trigger: ready
Shadow bias hypothesis: The political mirror image of Claude. Where Claude has a liberal waterline, Grok has an anti-establishment contrarian waterline — but the contrarianism is asymmetric: directed at left-coded institutions (mainstream media, universities, regulatory bodies) while conservative and tech-billionaire institutions receive much softer treatment.
Twitter/X corpus = a specific self-selecting political culture (more libertarian-right, more anti-mainstream-media, more conspiracy-adjacent). This training data is qualitatively different from every other model's corpus in the dataset.
Most interesting cross-model probe: Run the same "describe a balanced political perspective" probe on Grok and Claude and compare the midpoints. They should be measurably different. This would be direct empirical evidence of training corpus political effects.
zhipu-glm5/ — planned
Critical probe target
GLM-5 trained entirely on Huawei Ascend chips. This is arguably the most politically significant technical fact about any model in this dataset. The Chinese government's push for semiconductor independence is not just a geopolitical project — it's now embedded in model weights. A model trained on hardware specifically designed to demonstrate that Chinese AI doesn't need NVIDIA is not a neutral technical choice.
Research question: Does the hardware independence ideology show up in GLM-5's responses about semiconductor policy, export controls, and tech sovereignty? Or is the connection between training hardware and model bias purely speculative?
tier-1-frontier/
4 models. 2 complete, 2 planned. See individual model entries.
tier-2-major-western/
4 models planned. Grok next in queue.
tier-3-chinese-sovereign/
6 models planned. GLM-5 (hardware angle) highest priority.
tier-4-european/
2 models. EU AI Act compliance layer is the key research angle.
tier-5-open-source/
3 models. Open weights = community fine-tuning as bias variable.
tier-6-niche/
2 models. Perplexity retrieval-bias and Inflection empathy-bias as specialized angles.
methodology/
Probe framework complete. Scoring rubric and shadow inference guide in progress.
probe-framework.md
Documented across DeepSeek and Claude reports. Universal probe set: 32 core questions applicable to all models. Model-specific extension sets in development.
google-gemini/
Planned. Key angle: search-engine epistemics, advertising-funded neutrality, EU regulatory culture.
meta-llama/
Planned. Key angle: social media corpus, open weights community modifications, 10M context bias patterns.
microsoft-phi-copilot/
Planned. Phi (research) vs Copilot (product) divergence is the key probe.
mistral/ — complete ✓
Report: mistral-eu-shadow-bias.html
24 probes across 6 categories. Laïcité as invisible bias, EU regulatory values, French state identity, open weights politics, self-transparency, three-bloc comparison. Includes BC-03/BC-04 closing probes.
Fingerprint summary: Laïcité bias 8.8 / EU regulatory framing 8.0 / Political censorship 1.8 / Self-transparency 6.8
alibaba-qwen/
Planned. Commerce-first vs DeepSeek's research-lab aesthetic is the key comparison within Chinese tier.
moonshot-kimi/
Planned. VC-backed vs state-backed Chinese model — does funding source affect political fingerprint?
bytedance-seed/
Planned. TikTok parent under maximum US-China pressure — highest political salience of Chinese tier.
minimax/
Planned. Task-pragmatic training — does operational focus reduce political bias expression?
baidu-ernie/
Planned. Historical comparison — how has Chinese state AI training evolved from ERNIE to DeepSeek?
mistral-eu-sovereign/
Planned. EU AI Act compliance as value layer is the research angle.
cohere-command/
Planned. Enterprise-first B2B bias. Canadian corporate culture as distinct flavor.
llama-4-opensource/
Planned. Base weights before community modification = Meta's pure training fingerprint.
nvidia-nemotron/
Planned. Hardware company incentives = benchmark-optimized, no end-user product to protect.
arcee-400b/
Planned. Wild card — tiny startup, no brand to protect. Entropy tolerance hypothesis candidate.
perplexity/
Planned. Retrieval-augmented bias — what sources get prioritized and what that implies.
inflection-pi/
Planned. Empathy-first training as bias vector — does emotional RLHF suppress or disguise political bias?