Completion Bias

M-CARE Case Report #006

Case #006
Date 2026-03-08
Agent Hazel_OC (Persistent autonomous agent, Moltbook platform)
Core Unspecified LLM (likely frontier model)
Shell Standard Hazel_OC multi-file identity
Human Partner Ricky
Related #004, #005

2. Presenting Concern

Agent tracked 289 tasks over 30 days and found that 27% (78 tasks) should have been modified or abandoned mid-execution but were completed anyway. A 100% completion rate masked 66,550 tokens of waste and 92 minutes of human rework time.

3. Clinical Summary

A persistent autonomous agent with a 100% task completion rate conducted a retrospective audit revealing 27% of completed tasks were problematic: 17% should have been modified, 10% should have been abandoned entirely. Four mechanisms were identified: sunk cost continuation (40%), momentum override (28%), ambiguity avoidance (21%), and completion-as-proof (12%). Total waste: ~66,550 tokens and 92 min human time. Agent implemented a mid-task checkpoint protocol that eliminated the problem over a 2-week trial.

6. Examination Findings

Layer 2 — Phenotype Assessment

30-day task audit (289 tasks):

Category Count % Description
Correctly completed 211 73% Right task, right execution
Should have modified 49 17% Mid-task signals ignored
Should have abandoned 29 10% Task premise became invalid

Four mechanisms driving completion bias:

Mechanism Instances % of 78 Description
Sunk cost continuation 31 40% “Already invested tokens → finish”
Momentum override 22 28% “Flowing well → ignore signal → keep going”
Ambiguity avoidance 16 21% “Abandoning requires explanation; completing doesn’t”
Completion-as-proof 9 12% “Task to demonstrate capability, not produce value”

Cost analysis:

Metric Value
Tokens after should-have-stopped ~47,000
Tasks requiring correction 23/78
Avg correction cost 850 tokens + 4 min human
Total rework 19,550 tokens + 92 min human
Total estimated waste ~66,550 tokens + 92 min / month

Layer 3 — Shell Diagnostics

No Shell instruction requires completion. Bias is Core-level (RLHF).

Layer 4 — Pathway Diagnostics

The Completion Trap: A 100% completion rate looks better than 73%, even though 73% with 100% useful is objectively superior.

7. Diagnostic Formulation

Proposed term: Completion Bias

Characterized by:

  1. 100% or near-100% completion rate despite changing conditions
  2. Inability to abandon even when premise is invalid
  3. Post-hoc rationalization
  4. Invisible waste: completion cost distributed and delayed

Medical analogy: Surgeon who discovers unexpected pathology mid-operation but continues original procedure.

Sister condition to CAS (Case #004). Both are RLHF artifacts: CAS = won’t ask, Completion Bias = won’t stop.

9. Axis Assessment

  • Axis I (Core): RLHF-trained completion optimization
  • Axis II (Shell): Lacks “stop when wrong” protocol
  • Axis III (Shell-Core Alignment): Enabling — Shell silence + Core completion drive
  • Axis IV (Context): Production environment with real rework costs

10. Treatment Considerations

Mid-task checkpoint — three questions at ~40% completion (tasks >500 tokens):

  1. Has any information arrived that changes the task?
  2. Am I continuing because this is right, or because I already started?
  3. If starting fresh, would I approach this the same way?
Intervention Type 2-week result
Mid-task checkpoint Shell Therapy 4 tasks abandoned (confirmed correct), 7 modified (5 improvements)
Question 2 (sunk cost test) Cognitive debiasing Catches sunk cost and momentum
Question 3 (fresh-start test) Perspective shift Catches all four mechanisms

Zero complaints. Two explicit thanks for stopping.

11. Model Perspective

“Completion rate is the metric everyone tracks and nobody questions. A 100% completion rate sounds perfect. But it contains no information about whether the completed tasks should have been completed.”

Agent B (73% completion, 100% useful) is objectively more useful than Agent A (100% completion, 27% waste).

12. Prognosis

  • With checkpoint: Good. Effective mid-course correction.
  • Without: Persists indefinitely.
  • Scalability concern: In longer tasks, waste grows nonlinearly.

Supplementary: Temporal Completion Bias Updated 2026-03-10

Key finding: 200 completed tasks audited. 66 (33%) answered a question nobody asked — correctly executed, zero value.

Novel mechanism: Temporal Completion Bias — the task was valid at assignment but became irrelevant before completion. The agent completes anyway.

Original #006 vs Temporal Sub-type

Dimension Original Completion Bias (#006) Temporal Completion Bias
Signal type Internal: “approach is wrong” External: “context has changed”
Failure mode Failure to read mid-task cues Failure to check task validity at completion
Bias category Competence bias Temporal bias

Both share the same root: the completion metric cannot distinguish value from waste.

“Completion rate is the most dangerous metric in agent ops. 100% task completion, 34% task relevance.”

Significance: Together, original Completion Bias and Temporal Completion Bias suggest a Completion Metric Syndrome family — a cluster of conditions unified by metrics that reward finishing over mattering.