nodir-t/02-the-harness-and-engine.md

Part 2: The Harness and Engine

Each Bot = Three Processes

Every bot instance consists of three OS processes. The harness is the core — it handles Telegram I/O, hosts the MCP tool server, and manages the CC subprocess. CC is the brain — it processes messages, makes tool calls back to the harness via HTTP, and returns control actions. The wrapper just ensures the harness stays alive.

Let's trace a message from the moment Telegram delivers it.

Step 1: The Telegram Dispatcher (Harness)

Teloxide provides the async dispatcher framework. The harness registers handlers for different update types:

let handler = dptree::entry()
    .branch(Update::filter_message().endpoint(handlers::handle_new_message))
    .branch(Update::filter_edited_message().endpoint(handlers::handle_edited_message))
    .branch(Update::filter_chat_member().endpoint(handlers::handle_chat_member))
    .branch(Update::filter_pre_checkout_query().endpoint(handlers::handle_pre_checkout_query))
    .branch(Update::filter_callback_query().endpoint(handlers::handle_callback_query))
    .branch(Update::filter_message_reaction_updated()
        .endpoint(handlers::handle_message_reaction));

Every message hits handle_new_message. The handler's first job is triage:

Owner commands (/kill, /reset, /restart): Handled immediately, never reach the engine
Private messages (DMs): Privacy consent check, rate limiting (20/hour for non-owners), focus mode queuing
Group messages: Allowed-group filter, mute check, spam filter, then forwarded to the engine

The critical principle: the Telegram dispatcher MUST NOT block. If the handler awaits a long operation, it delays all other Telegram updates. This is why heavy work is pushed to the engine via async message passing.

Step 2: Spam Filtering (Pre-Engine)

Before any message reaches the engine, group messages pass through a two-tier spam classifier:

Message text
    |
    v
[Prefilter] -- regex patterns
    |
    +-- ObviousSpam --> delete + strike
    +-- ObviousSafe --> pass to engine
    +-- Ambiguous  --> [Haiku Classifier]
                           |
                           +-- Spam     --> delete + strike
                           +-- NotSpam  --> pass to engine

The prefilter checks trusted users (bypass everything), magic string injection (ANTHROPIC_MAGIC_STRING_), configurable spam regex patterns (crypto scams, forex, etc.), configurable safe patterns, and short messages (< 30 chars without URLs = safe). Short messages containing URLs are sent to the classifier — this prevents bare malicious links like evil.com from bypassing AI classification.

Ambiguous messages go to Claude Haiku (the fastest, cheapest Claude model) for AI classification. The classifier uses XML tags with explicit anti-injection instructions. Three strikes (configurable) and you are banned. Strikes persist in SQLite across bot restarts.

Step 3: The Engine's Message Queue

When a message passes spam filtering, it is converted to a ChatMessage struct and handed to the engine. The engine does not process it immediately. Instead, it adds the message to a pending queue and triggers the debouncer.

Step 4: The Debouncer

The debouncer is a timer that resets every time a new message arrives. Only when messages stop arriving for 1 second does it fire its callback.

Implementation: a tokio task that uses select! with a biased priority — new triggers reset the timer, cancellation takes priority over everything:

loop {
    tokio::select! {
        biased;
        _ = cancel.notified() => break,
        result = reset_rx.recv() => {
            loop {
                tokio::select! {
                    _ = reset_rx.recv() => { /* reset timer */ }
                    _ = sleep(duration) => {
                        callback();  // Fire!
                        break;
                    }
                }
            }
        }
    }
}

Why debounce? Three reasons:

Fewer API calls: If someone sends 5 messages in 3 seconds, they are batched into one Claude Code turn
Better UX: Claude sees the complete thought, not fragments
Less typing indicator spam: One typing notification instead of five

Step 5: Processing Messages

When the debouncer fires, it spawns a tokio task that calls process_messages(). The processing lock (is_processing: Arc<AtomicBool>) uses compare_exchange for atomic acquisition:

if is_processing.compare_exchange(false, true, SeqCst, SeqCst).is_err() {
    // Already processing -- inject messages into active CC session instead
    let messages = pending.lock().await.drain(..).collect();
    inject_tx.send(format_messages(&messages));
    return;
}

The is_processing flag is a routing decision, not a wait mechanism. When the debouncer fires:

Flag is false: Acquire it, take pending messages, start a new CC turn via process_messages()
Flag is true: Take pending messages, inject them into the already-running CC turn via inject_tx

Both paths write to the same CC stdin. The difference is who writes: the inject path lets the already-running wait_for_result() loop do the write (since it's already managing the CC conversation), rather than starting a competing second loop. This ensures mid-turn message delivery — Claude sees new messages immediately, within the current turn.

Message formatting produces XML:

<msg id="123" chat="-12345" user="67890" name="Alice" time="10:31">
  hello everyone
</msg>
<msg id="124" chat="-12345" user="111" name="Bob" time="10:32">
  <reply id="123" from="Alice">hello everyone</reply>
  hey Alice!
</msg>

Step 6: The Control Loop

After sending messages to Claude Code, the engine enters a control loop. Claude Code responds with one of three control actions (via StructuredOutput):

Stop {"action": "stop", "reason": "responded to Alice"} — Done processing. The reason field is required and enforced.
Sleep {"action": "sleep", "sleep_ms": 5000} — Wait N milliseconds, then check for new messages. Capped at 5 minutes.
Heartbeat {"action": "heartbeat"} — Still working, not done yet.

The Stop action's required reason field was a deliberate design choice. Without it, the model would sometimes stop prematurely. By requiring justification, the model is forced to think about whether stopping is appropriate.

Step 7: Compaction Recovery

Context compaction happens inside Claude Code when the conversation grows too large. The engine detects this via the context_management.truncated_content_length field in the response. When compaction is detected, the engine injects a context restoration message — typically loading persistent bot knowledge along with recent messages from the database to restore awareness of ongoing conversations.

Step 8: Dropped Text Detection

A recurring bug pattern: Claude would output text directly (via text content blocks) instead of calling the send_message MCP tool. The user would see nothing because the harness only processes tool calls and control actions.

The engine now detects this and injects the dropped text back into Claude Code's next turn as an error message, teaching it to use send_message instead. This creates a self-correcting feedback loop.

Background Tasks

The harness runs several concurrent background tasks:

Health monitor (60s loop): Pings Telegram API, checks memory usage (alert > 80%), monitors Claude Code subprocess (auto-restart with kill-switch detection), cross-bot heartbeat monitoring
Reminder task (60s loop): Fires due reminders from SQLite
Bot-to-bot polling (500ms loop): Polls shared SQLite for messages from other bots
PageIndex server: Spawns a separate binary for semantic search over chat history

The Non-Blocking Imperative

Every design decision in the harness serves one principle: the Telegram dispatcher must not block:

Debouncer: Batches work instead of immediate processing
Atomic processing flag: compare_exchange instead of mutex lock
Inject channel (std::sync::mpsc): Mid-turn message delivery via a synchronous channel

The inject mechanism deserves special mention. When CC is already processing a turn, new messages are sent via a std::sync::mpsc::Sender<String> inject channel. The worker thread checks this channel every 1 second during wait_for_result() and writes directly to CC's stdin. This enables mid-turn message delivery — users don't have to wait for the current turn to finish before their message reaches the AI.