The most technically intricate part of the system: managing Claude Code as a long-running AI process via stdin/stdout streaming.
- Cost: The Max subscription includes a generous Claude Code usage allowance. Using CC as a subprocess means the bots run on the subscription rather than per-token API billing — a massive cost difference when running 24/7 across multiple bots
- Context compaction: CC automatically manages context windows — when the conversation grows too large, it compacts older content while preserving important context. Building this yourself on top of the raw API would be significant engineering effort
- Session management: CC maintains conversation history internally, including session persistence across restarts via
--resume - Built-in MCP support: CC natively connects to MCP servers — Claudir just runs an HTTP server
- Tool orchestration: CC handles multi-step tool use without the harness managing turn-by-turn orchestration
- Model routing: CC handles caching, retries, and rate limit management
The tradeoff is complexity: managing a persistent subprocess with stdin/stdout streaming requires careful engineering.
Main Thread (tokio) Worker Thread (std)
| |
| WorkerMessage (async) |
+------------------------------->|
| +----+----+
| | Claude |
| | Code |
| | Process |
| +----+----+
| |
|<-------------------------------+
| Response (async) |
| |
| Inject (sync, bypasses) |
+- - - - - - - - - - - - - - - ->|
The worker currently runs on a standard thread (not tokio) with blocking I/O — a historical choice that predates MCP. This could be simplified using tokio::process::Command with async stdin/stdout, collapsing the three threads into a single tokio task with select!. The current design works reliably but is more complex than necessary. Three concurrent threads per CC instance:
- Worker thread: Receives messages from engine, writes to CC stdin, calls
wait_for_result(), sends responses back - Stdout reader thread: Reads CC's stdout line by line, parses JSON, updates heartbeat timestamp
- Stderr reader thread: Reads CC's stderr for error messages, detects rate limits, buffers last 10 lines for crash diagnostics
CC is spawned with carefully chosen flags:
claude \
--print \
--input-format stream-json \
--output-format stream-json \
--verbose \
--model claude-opus-4-6 \
--system-prompt "..." \
--resume SESSION_ID \
--mcp-config '{"mcpServers":{"claudir-tools":{"type":"http","url":"..."}}}' \
--json-schema '{"type":"object","properties":{"action":...}}' \
--allowedTools mcp__claudir-tools \
--tools WebSearch,WebFetchKey design decisions:
--system-promptas CLI flag (not first message) — survives context compaction--json-schemaenforces structured output — CC must return stop/sleep/heartbeat--resume SESSION_ID— preserves conversation across restarts- On Linux,
PR_SET_PDEATHSIGensures CC child dies when parent crashes (no orphans)
When CC is mid-turn and new messages arrive, they are injected directly into the running session rather than waiting for the next turn:
New message arrives while CC is processing
-> debouncer fires, is_processing is true
-> inject_tx.send(formatted_messages)
-> worker thread's wait_for_result() polls inject_rx every 1 second
-> writes directly to CC's stdin
-> CC sees the message mid-turn
The inject channel uses std::sync::mpsc (not tokio's async mpsc) because:
- The engine needs to send without awaiting (non-blocking from the Telegram dispatcher)
- The worker thread receives on a std thread, not a tokio task
When CC restarts, the inject sender inside its Arc<Mutex<>> is swapped to point to the new CC's channel. The engine's reference stays valid.
When CC dies, the worker collects comprehensive diagnostics:
- Stderr final report: Last 10 lines buffered by stderr reader
- Exit status: Unix signal decoding (maps signal numbers to SIGTERM, SIGKILL, SIGSEGV)
- ChildGuard: RAII wrapper that always calls
wait()to prevent zombie processes
struct ChildGuard { child: Option<Child>, pid: u32 }
impl Drop for ChildGuard {
fn drop(&mut self) {
if let Some(mut child) = self.child.take() {
child.kill().ok(); // Kill if still running
child.wait().ok(); // Always reap to prevent zombies
}
}
}Strictly speaking, Drop should be fast, and child.wait() blocks until the process exits. In practice this is near-instant after kill() sends SIGKILL, but a process in uninterruptible sleep (D state) could block wait() indefinitely. A more correct approach would be try_wait() with a timeout or spawning the reap onto a background thread.
When Claude Code hits Anthropic's API quota (usage limits on the Max subscription), it outputs an error to stderr with a reset timestamp. The stderr reader detects this, parses the reset time, and the worker thread sleeps until the quota refreshes. Meanwhile, the engine continues to accept and queue messages from Telegram — users experience a delay, not an error.
Multiple pieces of state survive CC restarts without the engine updating its references:
- PID (
Arc<AtomicU32>): New CC writes its PID to the same atomic./killalways targets the current process. - Heartbeat (
Arc<AtomicU64>): Shared with MCP server. New CC writes to the same timestamp. - Inject sender (
Arc<Mutex<Sender>>): Swapped to new CC's channel. Engine's reference stays valid. - Pending injections (
Arc<Mutex<Vec<String>>>): Inject messages drained from old CC, re-queued on new CC. No messages lost.
This design eliminates an entire class of stale-reference bugs.
- Spawn: Build
claudecommand with all flags, setPR_SET_PDEATHSIG, capture stdin/stdout/stderr - Initialize: Send first message, wait for system message (validate tools), wait for first result
- Main loop: Receive
WorkerMessagefrom engine, write to stdin, poll stdout and inject channels - Response: Parse control action, detect dropped text, return to engine
- Crash: Stderr captures last 10 lines, stdout detects disconnect, worker returns error
- Restart: Same shared Arc references, new CC picks up where old one left off
- Shutdown: SIGTERM,
ChildGuardensureswait(), inject messages drained to pending buffer
This infrastructure runs 24/7, processing thousands of messages per day across multiple Telegram groups, surviving crashes, rate limits, and model errors — all while maintaining a continuous conversation context that can span weeks.
The entire system embodies a pragmatic approach to AI infrastructure: use the model for what it is good at (understanding language, making decisions, calling tools), but keep all execution, validation, and lifecycle management in deterministic Rust code.

