This document provides a comprehensive review of the Swarm Tools architecture, implementation patterns, and strategic improvements based on modern Context Engineering principles (specifically Phil Schmid's "Context Engineering for AI Agents: Part 2").
Swarm Tools is a high-integrity multi-agent orchestration framework that leverages Event Sourcing and the Actor Model to manage parallel workflows for AI coding agents.
Swarm Mail implements a local-first Actor Model using an embedded PGLite (Postgres) database as an event store.
- Durable Streams Protocol: The system implements the semantics of durable, append-only byte streams. Every inter-agent message is a
message_sentevent. - Synchronous Projections: Unlike distributed systems with eventual consistency, Swarm Tools uses synchronous projections. When an event is appended, the materialized views (Inbox, Reservations) are updated immediately. This ensures "Read-Your-Own-Writes" consistency, which is critical for agent coordination.
- Application-Level Locking: File locks (Reservations) are managed via
DurableLockprimitives using optimistic concurrency control (CAS) on the projections table.
The Learning System optimizes decomposition strategies based on historical execution outcomes.
-
Confidence Decay: Learned patterns have a 90-day half-life.
$V_{current} = V_{raw} \times 0.5^{(\frac{age}{90})}$ . This ensures that stale engineering practices don't persist indefinitely. - Implicit Feedback Scoring: Success, Duration, Errors, and Retries are weighted to score the "Process Quality" of a task decomposition.
- 3-Strike Rule: Prevents infinite retry loops by detecting architectural stalls. After 3 failures on a single cell, the system triggers a mandatory architectural review.
Context Engineering is the discipline of designing a system that provides the right information and tools, in the right format, to give an LLM everything it needs to accomplish a task. Key dimensions:
| Concept | Definition | Swarm Tools Alignment |
|---|---|---|
| Context Rot | LLM performance degrades as context fills, even within technical limits (effective window ~256k tokens) | Partially addressed via checkpoint system, but compaction still uses summarization |
| Context Pollution | Too much irrelevant/redundant information distracts LLM and degrades reasoning | output-guardrails.ts truncates responses; Memory Lane prevents duplicate injections |
| Context Confusion | LLM cannot distinguish instructions/data/markers or encounters conflicting directives | Hierarchical tool structure (Coordinator vs Workers) reduces confusion |
| Effective Context Window | Quality performance range (<256k for most models) vs advertised limit (1M+) | Not explicitly monitored; compaction triggered by OpenCode, not rot threshold |
Key Insight from Phil Schmid: "Context Engineering is not about adding more context. It is about finding the minimal effective context required for the next step."
Phil Schmid distinguishes two approaches to reducing context:
Strip information redundant because it exists in the environment. If the agent needs it later, it can use a tool to retrieve.
Examples:
- Replace 500-line file with path:
Output saved to /src/main.py - Replace tool output with reference:
See results of swarmmail_inbox(limit=5)
Swarm Tools Status: PARTIALLY IMPLEMENTED
- ✅
output-guardrails.tstruncates responses (structure-aware) - ✅
swarmmail_inbox()limits to 5 messages without bodies - ❌ No
swarm_context_stash()/swarm_expand()tools for reversible compaction - ❌ Compaction hook preserves swarm state but doesn't replace large context blocks with stash IDs
Use LLM to summarize history including tool calls/messages. Triggered at context rot threshold (e.g., 128k tokens). Keep most recent tool calls raw to maintain "rhythm."
Swarm Tools Status: CURRENT APPROACH
compaction-hook.tsinjects swarm context but doesn't specify tail preservation- OpenCode's summarization may wipe recent tool calls
- Gap: No explicit instruction to preserve last 3-5 tool calls in raw format
Recommendation: Update compaction-hook.ts to enforce raw tail preservation:
// Add to SWARM_COMPACTION_CONTEXT
export const TAIL_PRESERVATION_INSTRUCTION = `
## TAIL PRESERVATION (MANDATORY)
When summarizing, you MUST preserve the last 5-10 tool calls in their RAW XML/JSON format.
This maintains the model's "rhythm" and prevents degradation of output quality.
DO NOT summarize:
- The most recent swarmmail_inbox() calls
- The most recent swarm_progress() calls
- The most recent hive_query() calls
These should appear in the summary exactly as they appeared in the conversation.
`;Phil Schmid recommends: Raw > Compaction > Summarization
Swarm Tools Reality: Currently prioritizes Summarization (lossy) over Compaction (reversible). This is a missed opportunity for preserving precision.
Problem: Multi-agent systems fail due to context pollution. If every sub-agent shares the same context, massive KV-cache penalty + confusion.
Solution from GoLang: "Share memory by communicating, don't communicate by sharing memory."
Fresh sub-agent with specific instruction (no full history).
Examples:
- "Search this documentation for X"
- "Find implementation pattern Y in codebase"
Swarm Tools Status: IMPLEMENTED
swarm_spawn_subtask()creates isolated worker agents- Workers receive only: subtask description, file list, shared context
- Coordinator manages full context; workers stay focused
Share full history only when sub-agent must understand entire trajectory (e.g., debugging agent needing previous error attempts).
Swarm Tools Status: POTENTIALLY VIOLATED
SUBTASK_PROMPT_V2includescompressed_contextparameter- Recovery mechanism injects
recovery_context.shared_context - Risk: If
compressed_contextis too large, workers inherit coordinator's pollution
Recommendation: Use swarm_expand(stash_id) pattern instead of passing compressed context. Workers fetch what they need on demand.
Problem: 100+ tools → Context Confusion. LLM hallucinates parameters, calls wrong tools.
Solution: Hierarchical Action Space with 3 levels.
Stable, cache-friendly, always visible.
Swarm Tools Atomic Layer:
| Category | Tools |
|---|---|
| Work Tracking | hive_create, hive_query, hive_update, hive_close |
| Coordination | swarmmail_init, swarmmail_send, swarmmail_inbox, swarmmail_reserve |
| Orchestration | swarm_init, swarm_status, swarm_progress, swarm_complete |
Current Count: ~15 atomic tools ✅
Use general tools (bash, browser) instead of specific commands.
Phil Schmid Example: mcp-cli <command> instead of specific tools for grep, ffmpeg, etc.
Swarm Tools Status: MIXED
- ✅ Uses
bashtool for CLI commands ⚠️ Has bothswarmmail_inbox()ANDswarmmail_read_message()(could be combined)- ❌ Exposes internal coordination tools (
mailbox_init,reserve_files) to coordinator unnecessarily
Recommendation: Create a curated "Primary" toolset:
// Tools visible to Coordinator ONLY
const COORDINATOR_TOOLS = {
// Hive (atomic)
hive_create_epic, hive_query, hive_update, hive_close,
// Swarm orchestration (atomic)
swarm_decompose, swarm_spawn_subtask, swarm_status, swarm_complete,
// Messaging (atomic)
swarmmail_broadcast, swarmmail_inbox, // NOT swarmmail_read_message
// Learning (atomic)
swarm_record_outcome, semantic_memory_find, semantic_memory_store,
// Skills (atomic)
skills_list, skills_use,
// Internal coordination (HIDDEN from Coordinator)
// swarmmail_init, swarmmail_reserve, swarmmail_release - used by workers only
};Libraries/functions for complex logic chains.
Swarm Tools Status: IMPLEMENTED via Skills
skills_use(name="testing-patterns")loads 25 dependency-breaking techniquesskills_use(name="swarm-coordination")loads multi-agent patterns- These are pre-compiled knowledge packages, not tool definitions ✅
Gap: No reusable code functions for common patterns (e.g., "auth_flow" function for fetch token → validate → refresh).
Anti-Pattern: "Org Chart" of agents (Manager, Designer, Coder) that chat with each other. This is anthropomorphic over-engineering.
Phil Schmid Solution: Treat sub-agents as deterministic function calls.
Pattern: call_planner(goal="...") returns structured Plan object. Main agent uses result without further conversation.
Swarm Tools Status: IMPLEMENTED ✅
// Coordinator calls planner like a function
const planningResult = await Task({
subagent_type: "swarm/planner",
description: "Decompose task into subtasks"
});
// Parses structured JSON result
const cellTree = JSON.parse(planningResult);
// { epic: {...}, subtasks: [...] }
// No conversation - just data in, data outWorkerHandoff Schema: Defines machine-readable contract between coordinator and workers.
interface WorkerHandoff {
task_id: string;
files_owned: string[];
files_readonly: string[];
dependencies_completed: string[];
success_criteria: string[];
epic_summary: string;
your_role: string;
what_others_did: string;
what_comes_next: string;
}Key Insight: "The main agent treats the sub-agent exactly like a deterministic code function. It can define the goal, tools, and output schema. This ensures the data returned is instantly usable without further parsing."
Dynamic tool fetching breaks KV cache and confuses models with "hallucinated" tools that disappear between turns.
Swarm Tools Status: COMPLIANT ✅
- All tool definitions are static in
plugin.ts - No dynamic tool discovery
- Tools don't change during session
The harness will be obsolete when next frontier model drops. Training locks into local optimum.
Swarm Tools Status: COMPLIANT ✅
- No model training
- Learning system is behavioral (pattern maturity, confidence decay)
- Focus on context engineering as flexible interface
Monitor token count and compact before hitting rot zone (~256k effective limit).
Swarm Tools Status: NOT IMPLEMENTED
- Relies on OpenCode to trigger compaction
- No explicit token monitoring in swarm system
- Risk: May already be in rot zone when compaction happens
Recommendation: Add swarm_monitor_context() tool:
export const swarm_monitor_context = tool({
description: "Check context utilization and recommend compaction if near rot threshold",
args: {},
async execute() {
// Pseudo-code - OpenCode API integration needed
const tokenCount = await getCurrentTokenCount();
const utilization = tokenCount / 256000; // Effective limit
if (utilization > 0.8) {
return {
status: "WARNING",
utilization: `${(utilization * 100).toFixed(1)}%`,
recommendation: "Trigger manual compaction or reduce context usage"
};
}
return {
status: "OK",
utilization: `${(utilization * 100).toFixed(1)}%`,
recommendation: "Continue"
};
}
});Instead of todo.md file (wasted ~30% tokens), use Planner sub-agent returning structured Plan object.
Swarm Tools Status: IMPLEMENTED ✅
swarm_plan_prompt()generates planning prompt- Returns
CellTreeschema (epic + subtasks) - No persistent todo files
Use binary success/fail metrics on real environments, not subjective LLM-as-a-Judge scores.
Swarm Tools Status: PARTIALLY IMPLEMENTED
- ✅ Implicit feedback scoring (duration, errors, retries, success)
- ✅ UBS scan runs on
swarm_complete() - ✅ Tests run via TDD workflow
⚠️ Still usesswarm_evaluation_prompt()with self-assessment (subjective)
Recommendation: Prioritize objective signals:
// Current (subjective)
const evaluation = await swarm_evaluation_prompt({ bead_id, subtask_title, files_touched });
// Objective-only (add to swarm_complete)
const objectiveChecks = {
compiles: await runTypecheck(files_touched), // binary
tests_pass: await runTests(files_touched), // binary
bugs_found: await runUBSScan(files_touched), // binary count
files_match: files_touched.length === planned_files.length // binary
};Human-in-the-loop for dangerous operations.
Swarm Tools Status: NOT IMPLEMENTED
- No confirmation prompts for risky operations
- Risk: Agents can run destructive commands without approval
Recommendation: Add guardrails:
// Before destructive operations
if (operation.isDestructive) {
const approved = await requestUserApproval({
operation: operation.name,
risk: operation.riskLevel,
preview: operation.preview
});
if (!approved) return "Operation cancelled by user";
}Rewrite as models improve. Remove scaffolding rather than adding it.
Swarm Tools Status: IN PROGRESS
- 5 rewrites in 6 months (Manus case study) - consistent with industry
- Learning system removes bad patterns (anti-patterns)
- Gap: No mechanism to detect over-engineering and trigger refactoring
| Phil Schmid Principle | Status | Evidence | Gap |
|---|---|---|---|
| Reversible Compaction | ❌ Partial | output-guardrails.ts truncates but no stash/expand |
Missing stash tools, tail preservation |
| Minimal Toolset | ~15 atomic tools ✅, but exposes internal coordination | Curated "Primary" toolset needed | |
| Share by Communicating | Isolated workers ✅, but compressed_context pollution | Replace with expand-on-demand | |
| Hierarchical Action Space | ✅ Implemented | Atomic + Sandbox + Skills levels | Could add Level 3 code functions |
| Agent-as-Tool | ✅ Implemented | WorkerHandoff schema, structured output | None |
| Pre-Rot Threshold | ❌ Missing | No token monitoring | Add context monitoring tool |
| Objective Evaluation | UBS + tests ✅, but subjective self-eval | Prioritize objective signals | |
| Dynamic Tool Avoidance | ✅ Compliant | Static tool definitions, no RAG | None |
Overall Grade: B+ (Strong foundation, compaction and monitoring gaps)
-
Implement Reversible Compaction (Section 4.1 in REVIEW.md)
- Add
swarm_context_stash()andswarm_expand()tools - Replace large context blocks with stash IDs
- Preserve raw tail (last 5 tool calls) during summarization
- Add
-
Curate Primary Toolset (Section 4.2 in REVIEW.md)
- Create
COORDINATOR_TOOLSexposing only ~15 atomic tools - Hide internal coordination tools (
mailbox_init,reserve_files) from coordinator
- Create
-
Add Context Monitoring (New)
- Implement
swarm_monitor_context()with pre-rot threshold detection - Alert at 80% effective context window utilization
- Implement
-
Replace Compressed Context with Expand-on-Demand
- Remove
compressed_contextparameter fromswarm_spawn_subtask() - Workers fetch context via
swarm_expand(stash_id)when needed
- Remove
-
Prioritize Objective Signals in Evaluation
- Remove subjective self-assessment from
swarm_evaluation_prompt() - Focus on: compilation, test results, UBS bug count, file match
- Remove subjective self-assessment from
-
Add Safety Guardrails
- User confirmation for destructive operations
- Preview before irreversible actions
-
Create Level 3 Code Functions
- Common patterns as reusable functions (auth_flow, data_pipeline, etc.)
- Reduces repetitive instruction in prompts
-
Over-Engineering Detection
- Metrics to identify unnecessary complexity
- Trigger refactoring suggestions
- Transition to Effect-TS: Move remaining functional wrappers in
swarm-mail.tsto the fullDurable*class primitives defined in theeffect/directory to improve type safety and error handling. - WAL Safety: Improve handling of PGLite WAL (Write-Ahead Log) to prevent corruption when multiple agents attempt to initialize the store simultaneously in high-concurrency swarms.
For applications using this library (e.g., ResearchManager and api_star_reviews):
- Traceability: Ensure
gen_trace_id()is propagated through Swarm Mail envelopes so that distributed agent work can be reconstructed in a single trace view. - Metric Feedback: Connect
ResearchRatingdata back into theswarm_record_outcometool to provide Explicit Feedback to the learning system, augmenting the current implicit signals.
- Implement
swarm_context_stashandswarm_expandtools for reversible compaction. - Update
compaction-hook.tsto enforce raw preservation of the conversation tail. - Refactor
swarmToolsto expose a curated "Primary" toolset for Coordinators. - Integrate
ResearchRating(Star Reviews) as a direct signal inlearning.ts.