Early Impressions of Claude Code Agent Teams

Note

Sharing this on Gists until I get around to setting up that blog! I'm still going down the rabbit hole of picking which static site generator to use.

Spoiler: I'm writing one.

Date: 2026-02-10

It's 3:30 AM and I just shipped 50 commits and 5,000 lines of code in less than two hours. My mind is buzzing.

The Setup

I've been building Hive, a small agent orchestrator that manages coding agents through GitHub project boards. Tiny application, but I had a ridiculous backlog of tickets — like 50 — that I'd been adding to as things came to mind. I got lost. I needed major architecture changes: reworking how systemd units run, how jobs get spawned, how workspaces are managed.

So I sat down to do some high-level planning. And then I totally deviated from the plan and said, screw it. Let's just fix the core issues with the architecture real quick. Famous last words… in the old world! Still, I was low on morale and wanted to see how far Claude Code would get. I was entirely surprised by the results!

Step 1: Plan the What, Not the How

I didn't plan in the traditional sense (i.e., focusing on the how). Instead, I told it not to focus on any code change at all. I told it to focus on the functionality we wanted to achieve — the end state.

I went back and forth with it a bunch until we had a refined document. What I wanted it to focus on was what we wanted, not how to get there. So that it was clearly specified.

Close to the end, I asked it to go through the entire backlog and add citations to everything in the write-up with the issues that already existed.

Then, in a new conversation, I asked it to implement our end state.

Step 2: It Reached for Agent Teams on Its Own

And it immediately reached for the new agent teams feature!

Up to this point, I'd been telling it to use agent teams and finding that it — rightly — recommended against it. The tasks weren't big enough or complex enough to warrant the coordination overhead.

This time it just did it on its own. It identified 8 phases and proceeded to basically refactor the entire application to my specification.

Step 3: Delegate Mode and the Coordination Loop

I immediately switched the coordinator agent into delegate mode where it can't do any work itself. I just talked to that one for the most part, except when I needed to intervene. I kept reminding it to just keep moving forward, not to ask me questions. And I kept reminding it to validate all of the changes.

So as it was going, it was basically building a phase and then it would spawn an agent to validate. The validator would double-check and make sure it matched the spec. Then it would report its findings, and the coordinator would open another agent to implement the fixes. It just kept iterating like that, one phase at a time.

Then I started trying to break it. I told it I wanted to add more features and some bug fixes and quality-of-life things, and it kept on moving.

The pattern: Build → Validate against spec → Fix → Repeat. One phase at a time. The coordinator doesn't touch code — it only delegates.

Step 4: Documentation via Diátaxis

To really push it, I used gitingest to grab a reference doc of the entire Diátaxis documentation framework. And I said, don't read the document yourself. Give it to an agent to come up with a plan for the documentation this project needs. Then spawn agents for each of the four quadrants of the framework — tutorials, how-to guides, reference, explanation.

So it writes all the docs, and then I asked it to spawn another agent to review the documents for factuality and adherence to the framework, specifically for each quadrant. Without me specifying, each agent gave a letter grade for the correctness of the documentation and the adherence to the framework. Then it iterated on them some more.

Step 5: Live Deployment and End-to-End Testing

This is where my mind was kind of blown.

I wanted to run a live test. I asked it to deploy — told it where the playbooks were and told it to just remove that, switch to a Makefile, and start managing deployment itself. It began stopping and removing the old process and the old units from systemd. It deleted the old role from the Ansible playbook. Then it proceeded to deploy the new service itself, document it, and add systemd units.

Did I mention I did all of this in YOLO mode?

Aside: I'm running Claude on bare metal with significant firewalling, SELinux (enforcing), bare minimum credentials, everything containerized in podman, and no commingling of personal data. Don't try this at work (but maybe at home with some caution)!

I gave it my project board and an issue I wanted to test with. I said, "Go do a live test end to end. Don't ask me anymore f$%#ing questions. (It kept pausing to ask for approval at various points.) Spawn an agent to monitor while one does the test."

One agent checked the logs, checked the status, and assisted with troubleshooting. Another agent used the GitHub CLI (I lied about bare minimum credentials—do as I say…) to move the ticket into the ready column to activate the pipeline. I wrote a couple of comments about things I was seeing, and it spawned an agent to make fixes.

And before I knew it, I had the end-to-end pipeline working.

The Numbers

50 commits. 5,000 lines of code. Less than 2 hours. 3:30 AM.

The Real Insight: Sandboxing Is the Game Changer

All that's to say — it's amazing to see what happens when you have appropriate sandboxing to be able to run in YOLO mode. I think that's the real hero of this story.

This was a personal project on a remote server with user-level systemd — no root access, nothing scary. But the principle holds: when the agent can execute freely without asking permission for every shell command, the coordination loop actually works. The overhead of approve/deny on every action would have killed this entire workflow.

Now, how do we make this safer?

What's Still Missing

I had to babysit the coordinator. I kept telling it to follow good practices, to check its work, to ask for reviews, to validate. What I really need is a system that does that on its own. Getting agents to ship quality, production-ready code — when it's running in a sandbox, it can run safely in YOLO. But someone still has to enforce the norms.

That's what I'm working on next.

Should You Try Agent Teams?

F@#%ing, yes!

Here's the workflow that worked for me:

Plan the end state, not the implementation. Tell the agent what you want, not how to build it. Be specific about functionality, not code.
Let the agent decide when to use agent teams. Don't force it. When the task is complex enough, it'll reach for it on its own.
Use delegate mode on the coordinator. Don't let it do work itself. Its job is to orchestrate, validate, and iterate.
Validate every phase. Spawn validator agents that check work against the spec. Then spawn fixers. Repeat.
Tell it not to ask you questions. Seriously. Let it run. Intervene when you see something, not when it's unsure.
Get your sandboxing right. This is the unlock. YOLO mode in a safe environment changes everything.

ivy/early-impressions-of-claude-code-agent-teams.md

Select an option

No results found