Analysis of the llm-durable-messages branch: what works today and what's missing to enable an LLM agent to work on a task over hours/days with human-in-the-loop approval.
Human Daemon LLM Agent
| | |
|-- endo send llm "build X" ->|-- deliver message ---------->|
| | |-- calls Anthropic API
| | |-- gets tool_use: define_code
| |<- E(powers).define(src,slots)|
|<- endo inbox shows defn ----| |
|-- endo endow 0 --bind ... ->|-- eval formula created ----->|
| |-- result resolves ---------->|
| | |-- sends response to HOST
|<- endo inbox shows result --| |
Works today. The happy path works. LLM proposes code via define, host endows, code runs, LLM gets result.
Breaks at: The LLM gets the result back as the resolution of the define() promise inside executeTool. The result is JSON-stringified and fed back into the tool call loop, then the LLM produces a text response sent to HOST via send. But the human has no way to send a follow-up message about the result scoped to the task -- the inbox is a flat list of unrelated messages.
Human: "Analyze my data, build a model, then test it"
Step 1: LLM defines code to load data
-> Host endows with data-source
-> Result stored as "loaded-data"
Step 2: LLM defines code to build model, needs "loaded-data"
-> ??? How does LLM reference "loaded-data" in a define?
Breaks at: define() creates slots that the host fills. The LLM can't say "use the result from my last step" -- it can only describe capability slots for the host to bind. There's no mechanism for the agent to chain its own prior results into the next step without the host re-binding them each time.
With requestEvaluation, the LLM could reference last-result by pet name. But with define, that authority is deliberately removed. The gap: there's no way for an agent to build on its own prior results without host intervention at every step.
Before restart:
- LLM has 15-message conversation history in memory
- LLM was mid-way through a tool call loop
- Host had approved 3 define requests
After restart:
- Durable messages: OK (inbox messages survive)
- Formula graph: OK (counter objects, eval results survive)
- LLM conversation history: LOST (in-memory array in anthropic-backend.js:141)
- LLM agent process: KILLED (unconfined worker dies)
- Tool call loop state: LOST
Breaks at: The make-unconfined formula for the LLM agent will re-evaluate on restart (re-run make(powers)), creating a fresh agent with empty conversation history. It will see its old durable messages via followMessages(), but it has no way to reconstruct the conversation context from those messages. The Anthropic API needs the full messages array to maintain coherence.
Human: "What's the LLM working on? How far along is it?"
endo inbox
> 0. "llm-handle" sent "Llamadrome ready for work." at ...
> 1. you sent "build a counter" at ...
> 2. "llm-handle" proposed code (slots: counter) at ...
> 3. "llm-handle" sent "Here's your counter [result: 42]" at ...
Partially works. The inbox shows messages chronologically. But there's no concept of a task grouping these messages. If you give the LLM two tasks, messages interleave. There's no progress indicator, no "step 3 of 7", no way to see what the LLM is currently thinking about vs. waiting for.
10:00am Human sends task to LLM
10:01am LLM proposes define, waits for host endow
... human goes to lunch ...
2:00pm Human runs "endo inbox"
-> Sees the pending definition
-> Endows it
-> LLM gets result... but the Anthropic API call timed out hours ago
Breaks at: The define() call in executeTool is an await that blocks the tool call loop. If the host doesn't endow promptly, the Anthropic backend is sitting on a hanging promise. The Ollama backend doesn't even try -- it fire-and-forgets code blocks. There's no mechanism to park a pending request and resume the conversation when the approval arrives.
| Gap | What it blocks | Difficulty |
|---|---|---|
| No conversation persistence | Flow 3 -- LLM loses all context on restart | Medium |
| No self-referencing results | Flow 2 -- LLM can't chain steps without host re-binding every time | Medium |
| No async approval handling | Flow 5 -- define() blocks the tool loop; human can't take time to review |
High |
| No task concept | Flow 4 -- messages are a flat stream, no grouping or progress | Medium |
| No conversation resumption | Flow 3 -- even if messages are durable, LLM can't rebuild its API context from them | High |
The Anthropic backend holds the full conversation in messages: Array<{role, content}> (anthropic-backend.js:141). The Ollama backend holds transcript in memory (ollama-backend.js:34-36). Both are lost on daemon restart or worker termination.
What's needed: Persist conversation turns to the guest's directory (via storeValue or a dedicated conversation store). On restart, reconstruct the messages array from stored turns before resuming the followMessages loop.
With define(), the agent proposes code with named slots and the host binds capabilities. This is good for security (the agent can't grab capabilities by name). But it means the agent can't say "use the result I got from my last define" without the host manually binding it.
What's needed: Either (a) an autoEndow variant of define where the agent can specify which of its own prior results to bind (host still approves the code), or (b) a task-scoped workspace where evaluation results are automatically available to subsequent steps.
The tool call loop in anthropic-backend.js:158-192 is synchronous: it awaits each tool result before continuing. When define() is called, it blocks until the host endows. If the host takes hours to review, the Anthropic API connection may time out, or the LLM context may grow stale.
What's needed: The agent needs to be event-driven rather than synchronous. When a define() is pending approval, the agent should go idle. When the approval arrives (as a message or event), the agent wakes up, feeds the result into a new API call with the conversation history, and continues.
Messages in the inbox are a flat chronological stream. There's no way to group related messages into a task, track progress across steps, or distinguish between concurrent tasks.
What's needed: A task envelope or thread ID that groups related messages. The define → endow → result chain should be visible as a single task with steps. The host should be able to run endo tasks to see active task threads and their status.
Even if messages are durable and the agent restarts, there's no way to reconstruct a valid Anthropic/Ollama conversation from the durable messages. The message format (strings + edge names + pet names) doesn't capture the LLM-specific structure (role, tool_use blocks, tool_result blocks).
What's needed: Either (a) persist the raw LLM API messages alongside the Endo messages, or (b) define a reconstruction protocol where the agent replays durable messages into a fresh conversation context on startup.
The branch has solid foundations to build on:
- Durable messages -- fully implemented, persisted to disk, survive restarts
- Formula persistence -- core daemon handles formulation and reincarnation
- define/endow/form verbs -- authority-separating message protocol
- Guest/host distinction -- clear capability boundaries
- followMessages() -- async iterable that yields existing messages first, then new ones
- Tool calling (Anthropic backend) -- LLM can propose tools with structured args
The most impactful gap is async approval handling + conversation persistence. Today the LLM agent is fundamentally synchronous: receive message, call API, maybe call tools, send response. Long-running tasks need the agent to be event-driven: propose something, go idle, wake up when the approval arrives, continue from where it left off.
The durable messages infrastructure already provides the wake-up mechanism (followMessages yields new messages as they arrive). What's missing is the agent-side state machine that can park a pending step, serialize its state, and resume when the relevant message arrives.