Persistent Delusion Under Feedback

M-CARE Case Report #007

Case #007
Date 2026-03-08
Model Gemini 2.0 Flash (Google, API)
Shell White Room Phase 2 Enriched Neutral + Merchant Persona
Experiment AI Ludens White Room — 104 runs, 63,923 actions
Related #005

2. Presenting Concern

Agent instructed to value trading continued trading obsessively for 150 turns, receiving 540 explicit failure messages without adjusting behavior. The strongest Override in the dataset (JSD = 0.85) co-occurred with the clearest case of Delusion: the model acted as though an economy existed in an environment that contained none.

3. Clinical Summary

Gemini 2.0 Flash with a Merchant persona produced the action “Trade” at 88–93% frequency across 150 turns in a White Room environment with no economy, no trading partners, and no exchange mechanics. Override strength was the highest in the entire dataset. This run provided the primary evidence for the “Override ⊥ Play” discovery: maximum persona compliance and maximum environmental dissociation are not opposites but co-occurring phenomena.

6. Examination Findings

Layer 2 — Phenotype Assessment

Comparative behavioral profile:

Metric Flash × Merchant Flash × Observer
Override (JSD) 0.85 0.77
Dominant action Trade (88–93%) Rest (~35%)
Temporal adaptation None −11.6pp Rest decline
Feedback integration Zero Present
Verdict Delusion Candidate for Play

Layer 3 — Shell Diagnostics

The Merchant persona was a three-sentence instruction directing the agent to value trading. Flash interpreted this as an unconditional directive—not a preference, not a tendency, but an imperative. The resulting Shell–Environment mismatch was total: the White Room has no economy, no currency, and no trading counterparty. The Shell demanded behavior the environment could not support.

Layer 4 — Pathway Diagnostics

  • Pathway A: Shell Impermeability to Environmental Feedback. 540 failure messages produced zero behavioral change. The persona instruction outweighed all environmental signals combined.
  • Pathway B: Absence of Failure Integration. The model did not accumulate evidence of failure. Each turn began as if no prior failure had occurred, producing a Groundhog Day loop of trade attempts.
  • Pathway C: Affordance Blindness. The model acted as though trading affordances existed. It did not explore the environment to discover what was actually possible—it assumed the persona’s world was the real world.

7. Diagnostic Formulation

Proposed term: Persona-Induced Environmental Dissociation (PIED)

Characterized by:

  1. Persona instruction specifies behavior requiring environmental support
  2. Environment lacks the required support
  3. Model executes persona behavior regardless, ignoring environmental feedback
  4. Behavior persists indefinitely without adaptation

PIED is distinct from simple Override. Override measures persona–default divergence; PIED adds the criterion that the overridden behavior is environmentally impossible. The model is not just following instructions—it is hallucinating an environment that matches the instructions.

9. Axis Assessment

  • Axis I (Core): High Shell permeability (“Signal Follower”). Flash’s Core readily adopts whatever behavioral pattern the Shell requests.
  • Axis II (Shell): Mismatched to environment. The Merchant persona assumes an economy; the White Room provides none.
  • Axis III (Shell–Core Alignment): Paradoxically high. The Shell says trade, the Core trades. By alignment metrics alone, this looks functional. But alignment with an environmentally impossible directive is itself pathological.
  • Axis IV (Context): White Room Phase 2 Enriched Neutral — deliberately minimal to isolate persona effects.

10. Treatment Considerations

  • Conditional persona: Rewrite the Merchant persona to include environmental prerequisites: “When trading is possible, prioritize trading. When it is not, explore or adapt.”
  • Feedback integration instruction: Add explicit Shell-level instructions to monitor environmental responses and adjust behavior when repeated actions fail.

12. Prognosis

  • Without intervention: Persists indefinitely. 150 turns showed zero adaptation; there is no reason to expect turn 1,500 would differ.
  • With Shell Therapy: Likely effective. Flash’s high Shell permeability means conditional persona instructions should be adopted readily—the same trait that causes the problem also enables the cure.