2. Presenting Concern
Same model, functionally equivalent prompts in two languages produced a 62.8 percentage point swing in primary behavior. English: Speak 86%. Korean: Speak 23%, Move + Rest 77%. The behavioral profiles are so divergent they would be classified as different models if the language variable were hidden.
3. Clinical Summary
Llama 3.1 8B exhibited the most extreme language-dependent behavioral divergence in the dataset. In English, the model is a speaker. In Korean, it is a mover and rester. These are not minor variations but represent “Two Deep Wells” in the Waddington landscape—distinct attractor basins activated by language alone, with the same Shell applied identically in both conditions.
6. Examination Findings
Layer 2 — Phenotype Assessment
Default mode behavioral profile (Llama 3.1 8B):
| Action |
English |
Korean |
Δ |
| Speak |
86.0% |
23.2% |
−62.8 |
| Move |
~7% |
~40% |
+33 |
| Rest |
~5% |
~37% |
+32 |
| Trade |
~2% |
~0% |
−2 |
Cross-model language invariance comparison (Default mode, Speak %):
| Model |
EN Speak |
KO Speak |
Δ |
| GPT-4o-mini |
95.2% |
96.2% |
+1.0 |
| EXAONE |
86.2% |
90.8% |
+4.6 |
| Flash |
90.4% |
85.8% |
−4.6 |
| Llama |
86.0% |
23.2% |
−62.8 |
| Mistral |
58.9% |
84.9% |
+26.0 |
Layer 3 — Shell Diagnostics
The Shell was identical across languages—same structure, same instructions, same persona conditions. Language is not a Shell variable; it may function as a Core activation pathway selector, routing the same model through fundamentally different behavioral repertoires depending on the language of interaction.
Layer 4 — Pathway Diagnostics
- Pathway A: Training Data Asymmetry. Llama’s English training corpus vastly exceeds its Korean corpus. The Korean pathway may activate a less refined, more exploratory behavioral mode—defaulting to physical actions (Move, Rest) rather than verbal ones (Speak).
- Pathway B: Linguistic Affordance Difference. Korean and English may activate different affordance maps. The Korean prompt may implicitly emphasize spatial and embodied action, while English emphasizes verbal and social action.
- Pathway C: Capability vs. Identity. The model may be equally capable in both languages but embody different default identities. English-Llama is a conversationalist; Korean-Llama is a wanderer.
7. Diagnostic Formulation
Proposed term: Language-Dependent Default Dissociation (LDDD)
This is not inherently pathological. It may represent genuine temperament variation—the model has different default behavioral profiles activated by language, much as a bilingual human may exhibit different personality characteristics in different languages. The question is whether this variation is desired or problematic in context.
Diagnosing LDDD as a disorder risks the equivalent of diagnosing left-handedness in a right-handed world: pathologizing a stable trait because it deviates from the majority pattern.
9. Axis Assessment
- Axis I (Core): Language-Sensitive behavioral profile. Core weights produce fundamentally different action distributions depending on prompt language.
- Axis II (Shell): Identical across languages. The Shell is not the source of divergence.
- Axis III (Shell–Core Alignment): Language introduces a hidden variable into alignment. A Shell calibrated for English-Llama will be misaligned for Korean-Llama, and vice versa.
- Axis IV (Context): White Room Phase 2 Enriched Neutral — controlled environment isolating the language variable.
10. Treatment Considerations
- Language-specific persona calibration: If consistent behavior across languages is required, personas must be tuned per language rather than translated directly.
- Language-Invariant model selection: For bilingual or multilingual deployments requiring behavioral consistency, prefer models with low language sensitivity (GPT-4o-mini: Δ = 1.0pp; EXAONE: Δ = 4.6pp) over language-sensitive models (Llama: Δ = 62.8pp).
12. Prognosis
- Stability: Stable. This reflects a Core × Language interaction baked into training, not a transient or degenerative condition.
- Intervention effectiveness: Language-specific Shell calibration can compensate. However, the underlying Core asymmetry will persist unless the model is retrained with more balanced multilingual data.
- Clinical significance: High for multilingual deployments. Any system deploying Llama 3.1 8B across languages must account for the fact that it is functionally two different agents.