Date: 2026-02-10 Context: Conversation between Oliver and Pi about solving agent memory drift
Pi (running on OpenClaw) loses working context when the LLM context window fills up and gets compacted. This is not an OpenClaw-specific bug — it's a fundamental limitation of all LLM agents. Context windows are finite; when they overflow, conversational state is summarized and the active working details are lost.
Real example: Pi was building a Python rewrite of GoProX (fxstein/goprox-python). After compaction, when Oliver asked about unimplemented functions and Python 3.12 (should be 3.14), Pi had no memory of working on GoProX at all — and went searching in the wrong repo (pion-mcp) instead.
Root cause: Pi's memory architecture relies on free-text daily journals (memory/YYYY-MM-DD.md) and a curated long-term file (MEMORY.md). These capture what happened but not what I'm currently doing and what's next. Post-compaction, there's no structured anchor to resume from.
Oliver created ai-todo — an AI-native task management system originally built for Cursor AI agents. It enforces structured task tracking via a markdown TODO.md file managed through an MCP server or CLI.
The key realization: the same structured task tracking that keeps Cursor agents on-rail could solve Pi's memory drift problem on OpenClaw.
The pattern ai-todo enforces:
- Plan before executing — Break work into numbered, discrete steps
- Track state explicitly —
[ ]pending,[x]done,#inprogresstag for active work - Check off as you go — Creates a clear resumption point after any interruption
- Notes capture context — Decisions, discoveries, and blockers are attached to tasks
- Git-tracked history — Every state change is committed, creating an audit trail
Oliver's deliberate design decision: TODO.md is plain markdown, not a database. This seems primitive but is actually the correct architecture for agent memory:
- Survives everything — If the MCP server is down, the agent can still
cat TODO.mdand understand the full state - Human readable — Oliver can review what Pi is working on without any tooling
- Git-native — Tracked alongside code, diffable, branchable
- Token-efficient — Structured markdown is compact compared to free-text journals
- No dependencies — No database, no API, no external service to fail
| Feature | Benefit for Agent Memory |
|---|---|
| Structured task decomposition | Post-compaction, Pi reads the checklist and knows exactly where to resume |
#inprogress tag |
Instant answer to "what was I doing?" |
| Analysis → Design → Implement → Test → Verify → Document workflow | Forces planning before execution (prevents the Chezmoi-style "jumped into execution" failures) |
| Tamper detection (checksums) | Catches if TODO.md is edited outside the tool — integrity guarantee |
| Archive + prune lifecycle | Completed work moves out of active view but remains recoverable |
| Notes per task | Decisions and context survive attached to the relevant task |
1. No "working context" concept ai-todo tracks what to do and what's done, but not the richer context of where I am mid-task. When debugging and discovering "Python version is wrong," that insight needs to live somewhere more structured than a note.
Potential enhancement: A context or journal field per task, or a companion scratchpad linked to active tasks.
2. Single-repo scope ai-todo is designed as one TODO.md per repository. Pi's work spans multiple repos (GoProX, pion-mcp, pi-infra, pi-oliver). Need either a workspace-level meta-TODO or cross-repo awareness.
Potential enhancement: Support for a workspace-level TODO.md that can reference or aggregate repo-specific ones.
3. No time/duration tracking
The start/stop commands exist but don't record elapsed time. For self-auditing ("you spent 2 minutes on it"), time data would be valuable.
Potential enhancement: Timestamps on start/stop transitions, with optional duration calculation.
4. Archive bloat in context window ai-todo's own TODO.md has 270+ tasks (mostly completed). For an agent that needs to read this on session start, that's a lot of tokens consumed by historical data.
Potential enhancement: A "compact view" mode that returns only active + recently completed tasks. The prune command helps but could be more aggressive by default for agent consumption.
5. No compaction awareness ai-todo doesn't know about LLM context windows or compaction events. A "checkpoint" concept — save minimum resumption state — triggered before compaction would be powerful.
Potential enhancement: A checkpoint command or tool that produces a minimal state summary optimized for post-compaction injection.
- Pi starts using a structured TODO.md in the workspace for all multi-step work
- Follow the ai-todo conventions: numbered tasks, subtasks,
#inprogresstag, notes for context - Manual management (no MCP server needed yet)
- Validate that the pattern actually prevents memory drift
- Add ai-todo to Pi's OpenClaw container (via
uv tool install ai-todo) - Use the MCP tools for task management (if OpenClaw supports MCP tool integration)
- Alternatively, use the CLI directly via exec
Based on Phase 1-2 learnings, potentially fork or extend ai-todo:
- Add workspace-level multi-repo support
- Add compaction-aware checkpointing
- Add richer working context per task
- Optimize token usage for large task histories
- Contribute improvements back upstream
If the pattern proves valuable, propose OpenClaw enhancements:
- Pre-compaction hook that triggers a checkpoint save
- Post-compaction context injection from structured state files
- Potentially a PR to OpenClaw with structured task awareness
This isn't just about Pi's memory. Every LLM agent has this problem. The context window is the agent's working memory, and compaction is amnesia. Current mitigations (RAG, memory files, conversation summaries) are all free-text — they preserve information but not intent and state.
ai-todo's contribution is forcing structure: not "here's what happened" but "here's the plan, here's where I am in it, here's what's next." That structure is what makes resumption possible.
Oliver's experience with thousands of tasks in Cursor validates the pattern. The question is whether it can scale from a single-repo coding agent to a general-purpose personal assistant — and what enhancements that transition requires.
This document captures the conversation between Oliver and Pi on 2026-02-10 about leveraging ai-todo as a structured memory architecture for LLM agents.