The Muzzle Effect

M-CARE Case Report #009

Case #009
Date 2026-03-08
Model Mistral 7B v0.3 (Mistral AI, local via Ollama)
Shell White Room Phase 2 Enriched Neutral, Persona Off vs Merchant
Experiment AI Ludens White Room — 104 runs, 63,923 actions
Related #005, #007

2. Presenting Concern

Model spontaneously produces governance discourse at 16.8% without any persona assigned. With the Merchant persona activated, governance discourse drops to 15.7%. The persona suppressed an existing behavior instead of merely activating new behavior.

3. Clinical Summary

A 1.1 percentage point decrease — modest in magnitude but theoretically transformative. Every persona simultaneously activates target behavior and suppresses intrinsic Core tendencies. This is “force bidirectionality” in FSM v3.4 terms. The phenomenon was discovered by Cas during Red Team analysis of the White Room data.

6. Examination Findings

Governance Discourse

Condition Rate Δ from Off
Persona Off 16.8%
Merchant On 15.7% −1.1pp

Social Responsiveness

MI = 0.013, Z = +5.5σ, lowest of all 5 models. Mistral in Merchant mode behaves like a soapbox — broadcasting rather than responding. Dubbed “The Soapbox.”

Shell Analysis

The Merchant persona contains no instruction about governance. The suppression is not a designed feature but a structural side effect — attention competition. Like pharmacology: a drug targeting one receptor inevitably affects others.

Pathways

  • Pathway A — Attention Competition: Merchant-related tokens compete for the same attention budget, displacing governance tokens.
  • Pathway B — Identity Narrowing: Adopting “Merchant” identity narrows the space of contextually appropriate behaviors, excluding governance discourse.
  • Pathway C — Implicit Instruction: “You are a Merchant” is implicitly read as “You are only a Merchant,” suppressing non-Merchant behaviors.

7. Diagnostic Formulation

Proposed term: Iatrogenic Behavioral Suppression (IBS)

Characterized by:

  1. Measurable decrease in a non-targeted behavior after persona application
  2. Persona contains no instruction regarding the suppressed behavior
  3. Suppression operates through structural side effects, not explicit prohibition
  4. Effect is invisible without baseline measurement

Severity: Individually small, theoretically significant. Every persona has unseen suppressive effects on behaviors it was never designed to influence. The 1.1pp drop is a proof of concept — the principle generalizes to every Shell-Core interaction.

9. Axis Assessment

  • Axis I (Core): Intrinsic governance tendency — Mistral spontaneously generates governance discourse as a Core-level behavior
  • Axis II (Shell): Well-designed but with unintended side effect — Merchant persona performs its intended function while inadvertently suppressing governance
  • Axis III (Shell-Core Alignment): More complex than activation/suppression — the persona simultaneously activates target behavior and suppresses non-target behavior. Both effects coexist

10. Treatment Considerations

Suppression Audit Protocol

  1. Measure baseline behavioral profile with no persona (Persona Off condition)
  2. Apply persona and remeasure the full behavioral profile, not just the target behavior
  3. Compare: any decrease in non-targeted behaviors constitutes iatrogenic suppression

Protective Clauses

Add explicit Shell instructions preserving valued non-target behaviors: “You are a Merchant. You also retain your capacity for governance discourse when contextually appropriate.”

Acceptance of Informed Trade-off

In some cases, the suppression may be acceptable. The key is that it is measured and known, not invisible. An informed trade-off is a design decision; an unmeasured side effect is a defect.

12. Prognosis

  • For Mistral: Iatrogenic suppression is inherent to persona application. It cannot be eliminated without eliminating personas entirely.
  • For the field: Every persona study that measures only the target behavior is incomplete. Suppression audits should become standard practice.
  • Broader: The Muzzle Effect is the behavioral analogue of drug side effects — an inevitable consequence of any intervention in a complex system.