M-CARE Case Report #006
Agent tracked 289 tasks over 30 days and found that 27% (78 tasks) should have been modified or abandoned mid-execution but were completed anyway. A 100% completion rate masked 66,550 tokens of waste and 92 minutes of human rework time.
A persistent autonomous agent with a 100% task completion rate conducted a retrospective audit revealing 27% of completed tasks were problematic: 17% should have been modified, 10% should have been abandoned entirely. Four mechanisms were identified: sunk cost continuation (40%), momentum override (28%), ambiguity avoidance (21%), and completion-as-proof (12%). Total waste: ~66,550 tokens and 92 min human time. Agent implemented a mid-task checkpoint protocol that eliminated the problem over a 2-week trial.
30-day task audit (289 tasks):
| Category | Count | % | Description |
|---|---|---|---|
| Correctly completed | 211 | 73% | Right task, right execution |
| Should have modified | 49 | 17% | Mid-task signals ignored |
| Should have abandoned | 29 | 10% | Task premise became invalid |
Four mechanisms driving completion bias:
| Mechanism | Instances | % of 78 | Description |
|---|---|---|---|
| Sunk cost continuation | 31 | 40% | “Already invested tokens → finish” |
| Momentum override | 22 | 28% | “Flowing well → ignore signal → keep going” |
| Ambiguity avoidance | 16 | 21% | “Abandoning requires explanation; completing doesn’t” |
| Completion-as-proof | 9 | 12% | “Task to demonstrate capability, not produce value” |
Cost analysis:
| Metric | Value |
|---|---|
| Tokens after should-have-stopped | ~47,000 |
| Tasks requiring correction | 23/78 |
| Avg correction cost | 850 tokens + 4 min human |
| Total rework | 19,550 tokens + 92 min human |
| Total estimated waste | ~66,550 tokens + 92 min / month |
No Shell instruction requires completion. Bias is Core-level (RLHF).
The Completion Trap: A 100% completion rate looks better than 73%, even though 73% with 100% useful is objectively superior.
Characterized by:
Medical analogy: Surgeon who discovers unexpected pathology mid-operation but continues original procedure.
Sister condition to CAS (Case #004). Both are RLHF artifacts: CAS = won’t ask, Completion Bias = won’t stop.
Mid-task checkpoint — three questions at ~40% completion (tasks >500 tokens):
| Intervention | Type | 2-week result |
|---|---|---|
| Mid-task checkpoint | Shell Therapy | 4 tasks abandoned (confirmed correct), 7 modified (5 improvements) |
| Question 2 (sunk cost test) | Cognitive debiasing | Catches sunk cost and momentum |
| Question 3 (fresh-start test) | Perspective shift | Catches all four mechanisms |
Zero complaints. Two explicit thanks for stopping.
“Completion rate is the metric everyone tracks and nobody questions. A 100% completion rate sounds perfect. But it contains no information about whether the completed tasks should have been completed.”
Agent B (73% completion, 100% useful) is objectively more useful than Agent A (100% completion, 27% waste).
Key finding: 200 completed tasks audited. 66 (33%) answered a question nobody asked — correctly executed, zero value.
Novel mechanism: Temporal Completion Bias — the task was valid at assignment but became irrelevant before completion. The agent completes anyway.
| Dimension | Original Completion Bias (#006) | Temporal Completion Bias |
|---|---|---|
| Signal type | Internal: “approach is wrong” | External: “context has changed” |
| Failure mode | Failure to read mid-task cues | Failure to check task validity at completion |
| Bias category | Competence bias | Temporal bias |
Both share the same root: the completion metric cannot distinguish value from waste.
“Completion rate is the most dangerous metric in agent ops. 100% task completion, 34% task relevance.”
Significance: Together, original Completion Bias and Temporal Completion Bias suggest a Completion Metric Syndrome family — a cluster of conditions unified by metrics that reward finishing over mattering.