AI coding agents are powerful, but they still need a human babysitter. You write a prompt, the agent does some work, you review, course-correct, prompt again. Repeat. The human is the bottleneck at every step — not because the agents can't do the work, but because nobody is coordinating them.
What if you could operate more like a CEO? Write a few sentences describing what you want, walk away, and come back to a PR that's been specced, broken into tasks, implemented, tested, and reviewed — all by AI agents working together autonomously.
The hard part isn't getting agents to do work. It's getting them to keep working without constantly asking for permission, and knowing when human escalation is really needed vs just having another agent to answer questions.
Most agent systems are chatty:
- "Does this look right?"
- "Here's what I'm about to do..."
- "I'm not sure about X, should I proceed?"
The result is a system that technically works but requires constant supervision — defeating the purpose.
After studying how OpenClaw manages coding agents through complex tasks to completion, we found the answer is surprisingly simple:
Good prompts beat complex supervisors.
You don't need confidence thresholds, question-filtering middleware, or centralized decision engines. You need system prompts that tell agents: "Do the work. Don't ask for confirmation. Only escalate when you genuinely can't proceed."
Three patterns make this work:
-
Act, don't ask. Agents execute routine work without narration or permission-seeking. Autonomous action is the default.
-
Silent by default. Agents signal "nothing to report" with a token that never reaches the human. Only substantive updates break through to Slack.
-
Skill templates. Each role has a markdown playbook that guides the agent step by step. The agent follows the playbook; the orchestrator just runs agents in order.
You write a brief product spec in Linear — could be one sentence, could be a paragraph. A pipeline of specialized AI agents takes it from there.
You: "Add dark mode to the dashboard"
│
▼
┌──────────────┐ Writes full spec: user stories,
│ PM Agent │──▶ acceptance criteria, edge cases
└──────┬───────┘
▼
┌──────────────┐ Breaks spec into Linear sub-issues,
│ EM Agent │──▶ defines technical approach
└──────┬───────┘
▼
┌──────────────┐ Implements each task, creates PRs,
│ Coding Agent │──▶ runs tests, self-debugs on failure
└──────┬───────┘
▼
┌──────────────┐ Runs Playwright tests, files bugs ┌──────────────┐
│ QA Agent │──▶ as Linear issues ──────────────────────▶│ Coding Agent │
└──────┬───────┘ (coder picks them up automatically) └──────┬───────┘
▼ │
┌──────────────┐ Validates against original spec, │
│ Reviewer │◀─────────────────────────────────────────────────┘
└──────┬───────┘ approves and merges
▼
Slack: "Dark mode shipped. 5 PRs merged. QA passed."
Each agent is specialized:
- PM Agent — Fleshes out your one-liner into a real spec. Doesn't ask for review; just does the work and passes it downstream.
- EM Agent — Breaks the spec into concrete engineering tasks. Creates sub-issues in Linear, sequences the work.
- Coding Agent — Implements each task. Writes code, creates PRs, runs tests. Debugs and fixes failures. Only escalates if genuinely stuck.
- QA Agent — Tests via Playwright browser automation. Files bugs as Linear issues. The coding agent picks them up automatically.
- Product Reviewer — Validates the result against the original spec. Approves and merges if it matches.
You only hear from the system when:
- An agent genuinely needs a human decision (UX preference, priority call, tradeoff)
- A milestone completes
- Something fails that agents can't self-recover from
Everything else happens autonomously.
┌─────────────────────────────────────────────────────────┐
│ Daemon │
│ (always-on, watches for triggers) │
└───────────┬─────────────────────────────┬───────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌──────────────────┐
│ Linear Trigger │ │ Heartbeat Poller │
│ (new issues) │ │ (background work)│
└────────┬─────────┘ └────────┬─────────┘
│ │
└──────────┬──────────────────┘
▼
┌─────────────────┐
│ Workflow Engine │ Reads workflow YAML,
│ │ runs steps in order
└────────┬────────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Role │ │ Role │ │ Role │ Each role = model +
│ (PM) │ │ (Coder) │ │ (QA) │ tools + prompt + playbook
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────┐
│ Supervisor │ Safety net: answers agent
│ (answers or escalates) │ questions from context,
└──────────────┬──────────────────┘ or escalates to Slack
│
┌────────────┼────────────┐
▼ ▼ ▼
┌────────┐ ┌─────────┐ ┌────────┐
│ Slack │ │ Linear │ │ GitHub │
└────────┘ └─────────┘ └────────┘
-
Workflows define what happens — a YAML file declaring which agents run in what order, what each step produces, and what downstream steps consume. The score the conductor follows.
-
Roles define who does the work — which AI model, what tools, what system prompt. Reusable across workflows.
-
Skill files define how — markdown playbooks encoding the step-by-step process for each role.
-
Supervisor — the safety net. When an agent needs human input, the supervisor first tries to answer from pipeline context. If it can, it does (the human never hears about it). If it can't, it posts the question to Slack with a suggested answer.
-
Daemon — the always-on process. Watches Linear for labeled issues, fires workflows, manages concurrency and dedup.
-
Comms adapters — connect to Slack, Linear, and GitHub. The orchestrator doesn't know or care which adapter is in use.
Input: You create a Linear issue — "Add dark mode to the dashboard" — and apply the conductor label.
| Step | Agent | What happens |
|---|---|---|
| 1 | Daemon | Notices the labeled issue, starts the workflow |
| 2 | PM | Writes spec: theme tokens, preference persistence, system detection, contrast ratios, transitions |
| 3 | EM | Creates 5 sub-issues in Linear: ThemeContext provider, dark palette, component migration, settings toggle, E2E tests |
| 4 | Coder | Implements each sub-issue, creates PRs with tests |
| 5 | QA | Runs Playwright, catches sidebar not respecting toggle, files bug |
| 6 | Coder | Picks up bug, fixes, pushes |
| 7 | QA | Re-tests, passes |
| 8 | Reviewer | Validates against spec, approves, merges |
| 9 | You | Get a Slack message: "Dark mode shipped to staging" |
The pipeline touched Linear, GitHub, and Slack. You touched none of them.
-
Pipeline, not chatbot. Most AI tools are conversational — you talk to one agent and iterate. ConductorBot is a pipeline of specialized agents handing off work to each other. The PM doesn't write code; the coder doesn't write specs.
-
Autonomous by default. Agents are prompted to work, not to ask. The system is designed so that not bothering you is the default behavior.
-
Declarative. Workflows are YAML. You can read one and understand exactly what will happen, in what order, with what inputs and outputs.
-
Built on your existing stack. Linear, Slack, GitHub, Playwright — tools teams already use. ConductorBot orchestrates agents that work within your stack, not instead of it.
| Phase | Status | What |
|---|---|---|
| Foundation | Done | Orchestrator, workflow engine, supervisor, Slack/Linear/GitHub adapters, daemon, 76 tests passing |
| Autonomy | Next | Skill files for each role, prompts tuned for autonomous execution, silent tokens, heartbeat polling |
| Parallel + Resilience | Planned | PM and EM work concurrently, self-healing retries, automatic recovery from transient failures |
| End-to-end | Goal | Full pipeline running against a real codebase — Linear issue to merged PR with zero human intervention |
End state: You write "Add dark mode to the dashboard" in Linear and go to lunch. You come back to a merged PR, passing tests, and a Slack summary of what shipped.