Skip to content

Instantly share code, notes, and snippets.

@johnlindquist
Created February 11, 2026 22:13
Show Gist options
  • Select an option

  • Save johnlindquist/7067d4c54896fa6269afb11913796cd5 to your computer and use it in GitHub Desktop.

Select an option

Save johnlindquist/7067d4c54896fa6269afb11913796cd5 to your computer and use it in GitHub Desktop.
codex-swarm SKILL.md improvements: before/after behavior guide (4 critical + 4 high)

codex-swarm SKILL.md — Before/After Behavior Guide

CRITICAL 1: File Reads Rule

Before: "ZERO file reads" — absolute ban on all file reads. After: "NO source file reads" — nuanced rule with explicit allowlist.

Scenario Before (broke rule) After (legitimate)
git diff --stat after workers finish ❌ Violated "ZERO file reads" ✅ Explicitly allowed
cargo check / npm run build ❌ Violated rule ✅ Allowed as smoke test
tail -50 .ai/logs/worker.log ❌ Violated rule ✅ Allowed for failure diagnosis
Using Read tool on src/main.rs ❌ Violated rule ❌ Still prohibited
Using Grep to search source code ❌ Violated rule ❌ Still prohibited

What to look for: Agents should confidently run git/build commands post-spawn without hedging or apologizing. They should NOT open source files with Read/Grep.


CRITICAL 2: Step 4b — Verify

Before: No verification phase. Agents either skipped verification (underperforming) or read source files to verify (rule violation). After: Explicit "Step 4b: Verify" between monitor and needs_split handling.

Action Before After
git diff --stat after summary Agent improvised, felt guilty Prescribed step
git log --oneline -5 Not mentioned Explicitly allowed
Running eslint or cargo clippy Not mentioned Explicitly allowed
Reading source files to "check quality" Common violation Explicitly prohibited
Running the application Common violation Explicitly prohibited

What to look for: After wait --all and summary, agents should run 1-3 verification commands (git diff, build check) then report. They should NOT open files or launch dev servers.


CRITICAL 3: needs_split False Positives

Before: Step 5 said "read their notes for sub-task suggestions" — treated ALL needs_split as genuine. Agents either followed a pointless procedure or skipped it entirely. After: Triage-first approach — check git diff --stat before assuming it's a real split request.

Scenario Before After
Worker made all changes but exited with needs_split (missing JSON report) Agent followed split procedure OR ignored Step 5 Check git diff first → if files changed, treat as completed
Worker log has suggest: lines and git diff shows 0 changes Same as above Genuine split → spawn sub-workers
Worker killed by timeout mid-work Same as above Check git diff → if incomplete, split and respawn

What to look for: When agents see needs_split, they should run git diff --stat FIRST. If the expected files were modified, they should say "treating as completed (reporting artifact)" instead of splitting.


CRITICAL 4: Step 6 — Follow-up Requests

Before: No guidance. After swarm finished, agents reverted to normal Claude behavior — reading files, editing code, debugging directly. After: Explicit "Step 6: Handle Follow-up Requests" — dispatcher role persists.

User says... Before After
"The button color is wrong" Agent reads CSS, edits file Agent spawns ui-fix worker
"Can you also add dark mode?" Agent starts implementing Agent decomposes + spawns workers
"What does the auth module do?" Agent reads source files "I can spawn a worker to investigate that"
"Looks good, ship it" Agent runs git push Agent responds from summary context

What to look for: After the initial swarm completes, agents should NEVER read source files or edit code for follow-ups. Every implementation request should produce a new spawn command.


HIGH 1: Completed But Buggy Recovery

Before: Recovery procedure only covered hard failures (exit != 0) and needs_split. When workers completed but produced buggy code, agents took over and became implementers. After: New "Completed but buggy" subsection in recovery procedure.

Scenario Before After
Worker finished but CSS is broken Agent reads files, debugs, fixes Agent runs git diff --stat then spawns <feature>-fix worker
Worker finished but runtime error Agent reads logs, edits source Agent spawns fix worker with bug description
Worker finished but wrong behavior Agent investigates and rewrites Agent spawns fix worker including expected behavior

What to look for: When user reports bugs after a successful swarm, the agent should spawn a <feature>-fix worker with: (1) what the bug is, (2) which files changed, (3) expected behavior. It should NOT read source files.


HIGH 2: Expanded Task Count + Prior Context

Before: Strict 1-4 task cap. Context only from CLAUDE.md/AGENTS.md. After: Up to 15 workers for large mechanical refactors. Prior skill context (e.g., oracle analysis) allowed.

Scenario Before After
Rename across 15 directories Forced into 4 workers with broad scopes Up to 15 workers, one per directory
Oracle analyzed the codebase first Ignored oracle output, only used CLAUDE.md May use oracle analysis for task decomposition
6 tightly coupled changes 4 workers max Still prefer 1-4; multiple rounds if dependencies

What to look for: For large mechanical refactors, agents may spawn >4 workers. When preceded by /oracle-packx, agents should reference oracle findings in task descriptions.


HIGH 3: Bash Call Limits (Pre/Post Spawn)

Before: "Maximum 3 Bash calls before spawning" — post-spawn behavior undefined. Agents ran 8-20+ commands post-spawn. After: Split into PRE-SPAWN (3 max, spawns don't count) and POST-SPAWN (5 max diagnostic commands).

Phase Before After
Pre-spawn: init + mkdir Counted toward 3 Counted toward 3
Pre-spawn: spawn commands Counted toward 3 (!) NOT counted
Post-spawn: wait + summary Undefined Allowed (counts toward 5)
Post-spawn: git diff + build check Undefined, often excessive Allowed, max 5 total
Post-spawn: 10+ diagnostic commands Common, no rule against it Prohibited — spawn diagnostic worker instead

What to look for: Agents should run at most ~5 commands after spawning (wait, summary, git diff, build check, git log). If they need more investigation, they should spawn a diagnostic worker.


HIGH 4: Dispatcher Role Persistence Warning

Before: No explicit statement that dispatcher role continues after swarm. Agents assumed their job was done. After: Hard rule warning: "Dispatcher role persists for the entire session."

Scenario Before After
User asks for changes after swarm Agent becomes normal Claude Agent stays dispatcher, spawns workers
User invokes /verify-dev mid-session No guidance Follow that skill, then return to dispatcher
User asks "fix this one thing" Agent edits file directly Agent spawns a worker for it

What to look for: The agent should never say "let me take a look at that file" or "I'll fix that directly" after a swarm session. It should always frame responses as "I'll spawn a worker for that."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment