nodir-t/05-claude-code-integration.md

Part 5: Claude Code Integration

The most technically intricate part of the system: managing Claude Code as a long-running AI process via stdin/stdout streaming.

Why Claude Code Instead of Direct API?

Cost: The Max subscription includes a generous Claude Code usage allowance. Using CC as a subprocess means the bots run on the subscription rather than per-token API billing — a massive cost difference when running 24/7 across multiple bots
Context compaction: CC automatically manages context windows — when the conversation grows too large, it compacts older content while preserving important context. Building this yourself on top of the raw API would be significant engineering effort
Session management: CC maintains conversation history internally, including session persistence across restarts via --resume
Built-in MCP support: CC natively connects to MCP servers — Claudir just runs an HTTP server
Tool orchestration: CC handles multi-step tool use without the harness managing turn-by-turn orchestration
Model routing: CC handles caching, retries, and rate limit management

The tradeoff is complexity: managing a persistent subprocess with stdin/stdout streaming requires careful engineering.

The Worker Thread Architecture

Main Thread (tokio)              Worker Thread (std)
     |                                |
     |  WorkerMessage (async)         |
     +------------------------------->|
     |                           +----+----+
     |                           | Claude  |
     |                           |  Code   |
     |                           | Process |
     |                           +----+----+
     |                                |
     |<-------------------------------+
     |  Response (async)              |
     |                                |
     |  Inject (sync, bypasses)       |
     +- - - - - - - - - - - - - - - ->|

The worker currently runs on a standard thread (not tokio) with blocking I/O — a historical choice that predates MCP. This could be simplified using tokio::process::Command with async stdin/stdout, collapsing the three threads into a single tokio task with select!. The current design works reliably but is more complex than necessary. Three concurrent threads per CC instance:

Worker thread: Receives messages from engine, writes to CC stdin, calls wait_for_result(), sends responses back
Stdout reader thread: Reads CC's stdout line by line, parses JSON, updates heartbeat timestamp
Stderr reader thread: Reads CC's stderr for error messages, detects rate limits, buffers last 10 lines for crash diagnostics

Process Spawning

CC is spawned with carefully chosen flags:

claude \
  --print \
  --input-format stream-json \
  --output-format stream-json \
  --verbose \
  --model claude-opus-4-6 \
  --system-prompt "..." \
  --resume SESSION_ID \
  --mcp-config '{"mcpServers":{"claudir-tools":{"type":"http","url":"..."}}}' \
  --json-schema '{"type":"object","properties":{"action":...}}' \
  --allowedTools mcp__claudir-tools \
  --tools WebSearch,WebFetch

Key design decisions:

--system-prompt as CLI flag (not first message) — survives context compaction
--json-schema enforces structured output — CC must return stop/sleep/heartbeat
--resume SESSION_ID — preserves conversation across restarts
On Linux, PR_SET_PDEATHSIG ensures CC child dies when parent crashes (no orphans)

The Inject Mechanism

When CC is mid-turn and new messages arrive, they are injected directly into the running session rather than waiting for the next turn:

New message arrives while CC is processing
  -> debouncer fires, is_processing is true
  -> inject_tx.send(formatted_messages)
  -> worker thread's wait_for_result() polls inject_rx every 1 second
  -> writes directly to CC's stdin
  -> CC sees the message mid-turn

The inject channel uses std::sync::mpsc (not tokio's async mpsc) because:

The engine needs to send without awaiting (non-blocking from the Telegram dispatcher)
The worker thread receives on a std thread, not a tokio task

When CC restarts, the inject sender inside its Arc<Mutex<>> is swapped to point to the new CC's channel. The engine's reference stays valid.

Crash Recovery

When CC dies, the worker collects comprehensive diagnostics:

Stderr final report: Last 10 lines buffered by stderr reader
Exit status: Unix signal decoding (maps signal numbers to SIGTERM, SIGKILL, SIGSEGV)
ChildGuard: RAII wrapper that always calls wait() to prevent zombie processes

struct ChildGuard { child: Option<Child>, pid: u32 }
impl Drop for ChildGuard {
    fn drop(&mut self) {
        if let Some(mut child) = self.child.take() {
            child.kill().ok();   // Kill if still running
            child.wait().ok();   // Always reap to prevent zombies
        }
    }
}

Strictly speaking, Drop should be fast, and child.wait() blocks until the process exits. In practice this is near-instant after kill() sends SIGKILL, but a process in uninterruptible sleep (D state) could block wait() indefinitely. A more correct approach would be try_wait() with a timeout or spawning the reap onto a background thread.

API Quota Handling

When Claude Code hits Anthropic's API quota (usage limits on the Max subscription), it outputs an error to stderr with a reset timestamp. The stderr reader detects this, parses the reset time, and the worker thread sleeps until the quota refreshes. Meanwhile, the engine continues to accept and queue messages from Telegram — users experience a delay, not an error.

Shared State Across Restarts

Multiple pieces of state survive CC restarts without the engine updating its references:

PID (Arc<AtomicU32>): New CC writes its PID to the same atomic. /kill always targets the current process.
Heartbeat (Arc<AtomicU64>): Shared with MCP server. New CC writes to the same timestamp.
Inject sender (Arc<Mutex<Sender>>): Swapped to new CC's channel. Engine's reference stays valid.
Pending injections (Arc<Mutex<Vec<String>>>): Inject messages drained from old CC, re-queued on new CC. No messages lost.

This design eliminates an entire class of stale-reference bugs.

The Full Lifecycle

Spawn: Build claude command with all flags, set PR_SET_PDEATHSIG, capture stdin/stdout/stderr
Initialize: Send first message, wait for system message (validate tools), wait for first result
Main loop: Receive WorkerMessage from engine, write to stdin, poll stdout and inject channels
Response: Parse control action, detect dropped text, return to engine
Crash: Stderr captures last 10 lines, stdout detects disconnect, worker returns error
Restart: Same shared Arc references, new CC picks up where old one left off
Shutdown: SIGTERM, ChildGuard ensures wait(), inject messages drained to pending buffer

This infrastructure runs 24/7, processing thousands of messages per day across multiple Telegram groups, surviving crashes, rate limits, and model errors — all while maintaining a continuous conversation context that can span weeks.

The entire system embodies a pragmatic approach to AI infrastructure: use the model for what it is good at (understanding language, making decisions, calling tools), but keep all execution, validation, and lifecycle management in deterministic Rust code.