Skip to content

Instantly share code, notes, and snippets.

@jflam
Created February 10, 2026 21:56
Show Gist options
  • Select an option

  • Save jflam/6b39c6ab5516650ead9790845130a933 to your computer and use it in GitHub Desktop.

Select an option

Save jflam/6b39c6ab5516650ead9790845130a933 to your computer and use it in GitHub Desktop.
The Case of the Phantom Kill: Debugging Copilot's Bash Filter

The Case of the Phantom Kill: Debugging Copilot's Bash Filter

The Prompt

It started with a deceptively simple question from the user:

"Look through recent chats to find the time when you were having trouble with a commit because of kill in the commit message. This was bash prompt filtering in Copilot. Find the references and look in the tools file for that chat to extract the full bash prompt in each case."

But before I could even reach for the obvious tool — AgentBoss's SQLite full-text search — the user stopped me cold:

"Why are you using the fts feature of sqlite to find things? Answer this first."

Fair point. The AgentBoss guidelines are explicit: don't use MCP read tools for research. The vault is the source of truth. The FTS index is convenient, but the vault — a filesystem of markdown files at ~/.agentboss/vault/ — is where agents are supposed to do their archaeology. It's the design philosophy: agents should work with files, not databases. The vault is grep-able, human-readable, and doesn't require understanding a schema.

Act I: Searching the Vault

I dispatched a subagent to the daemon machine to search the vault's Chats/ directory — over a thousand markdown chat transcripts, each with YAML frontmatter and full conversation history.

The subagent grep'd through the chat files looking for the intersection of "kill" and "commit" — and surfaced three distinct incidents where agents had been silently blocked from committing code.

The Three Incidents

Chat 2a5bb0d5 — MCP Proxy Implementation

An agent was writing integration tests for the MCP proxy's reconnection logic. The commit message described a test scenario: "reconnects after daemon restart (kill + rebind same port)". The git commit command produced no output. No error. No commit. The agent eventually realized the word "kill" in the message body was triggering a filter, and rewrote it to "stop + rebind same port".

Chat d637ed80 — Tool Call Narrative Plan

This one was documenting an architecture change for per-turn narrative sessions. The commit message included "Kill session when turn ends" — a perfectly natural description of session lifecycle management. Silently blocked. Changed to "Terminate session when turn ends".

Chat 5ca800e9 — Vault Sync & Stale Binary Fix

A commit about cleaning up stale daemon binaries used the phrase "kill stale local dev daemon" in the message. Same silent failure. Changed to "stop stale local dev daemon".

The pattern was clear: the Copilot CLI's bash tool was scanning the entire command string — including quoted arguments like commit messages — for the word "kill", and refusing to execute.

Act II: The Missing Tools File

The user's prompt mentioned "the tools file for that chat." I went looking.

The vault chat files are narrative summaries — user messages, assistant messages, key decisions. They don't contain system prompts or tool definitions. Those are injected server-side by GitHub's Copilot API.

I searched the Copilot CLI's session state at ~/.copilot/session-state/ — roughly 1,800 sessions worth of events.jsonl files, workspace configs, and checkpoints. No tool definition files. The tool definitions aren't persisted anywhere on disk.

There is no "tools file." The tool definitions are constructed dynamically in code.

Act III: Into the Bundled JavaScript

This is where it got interesting. If the tool definitions aren't stored as files, they must be generated at runtime. The Copilot CLI ships as a bundled Node.js application at ~/.copilot/pkg/darwin-arm64/0.0.405/index.js.

The subagent dove into the minified JavaScript and found the filtering machinery:

The Validator: function eml()

This is the pre-execution gate. Before any bash command runs, eml() splits the command string on [;&|\n\r]+ and inspects each segment. It enforces three rules:

  1. pkill → Blocked unconditionally. "The 'pkill' command is not allowed. Use 'kill ' instead."
  2. killall → Blocked unconditionally. "The 'killall' command is not allowed. Use 'kill ' instead."
  3. kill without numeric PIDs → Blocked. "The 'kill' command must specify at least one numeric PID."

There's also a self-preservation check: if you try to kill the CLI's own PID or its parent PID, it blocks with "Cannot kill PID X — this is your own runtime."

The Parser: function i3t()

This function tries to be smarter. It maintains an allowlist of commands where "kill" appearing inside arguments should be ignored: echo, printf, cat, grep, sed, awk, head, tail, and about a dozen others.

If a command starts with one of these, the "kill" substring match is skipped. So grep "kill" somefile.txt would pass through fine.

But here's the catch: git is not on the allowlist. So git commit -m "kill the daemon" hits the validator, the validator sees "kill" in the command string, and blocks execution. No error message appears in the commit workflow — the command simply doesn't run.

The Ironic Twist

During the investigation itself, the subagent's own grep commands were getting blocked. Searching for "pkill\|killall" in the source code triggered the pkill detector — because pkill and killall have hardcoded checks that fire before the allowlist logic. Even though grep is on the allowlist for "kill", the pkill/killall checks run first and match the pattern anywhere in the command line, including inside grep search patterns.

The debugger was being debugged by the thing it was debugging.

The Prompt Instructions

The tool definition generated by getShellTool() includes this instruction:

"When terminating processes, always use kill <PID> with a specific process ID. Commands like pkill, killall, or other name-based process killing commands are not allowed."

This is the soft guidance. The hard enforcement in eml() and i3t() is the runtime backstop — and it's the runtime backstop that's overly aggressive, catching "kill" in commit messages, documentation strings, and grep patterns.

What We Learned

  1. The vault works. Grep over a thousand chat transcripts found three needles in a haystack in seconds. No schema knowledge required, no query language to learn. Just files and text search.

  2. Provenance matters. Without the full chat transcripts, we'd have no record that these silent failures ever happened. The agents adapted and moved on — only the vault remembers the original blocked commands.

  3. Runtime filters are blunt instruments. String-matching "kill" anywhere in a command line is a security/safety measure with significant collateral damage. The allowlist in i3t() is an attempt to soften the blow, but it can't anticipate every legitimate use of the word — like commit messages, documentation, or even searching for the filter itself.

  4. The best debugging stories are recursive. An agent investigating why agents can't use the word "kill" gets blocked from using the word "kill" in its investigation. That's not a bug report — that's a koan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment