ai-todo for Agent Memory: Analysis & Proposal

Date: 2026-02-10 Context: Conversation between Oliver and Pi about solving agent memory drift

The Problem

Pi (running on OpenClaw) loses working context when the LLM context window fills up and gets compacted. This is not an OpenClaw-specific bug — it's a fundamental limitation of all LLM agents. Context windows are finite; when they overflow, conversational state is summarized and the active working details are lost.

Real example: Pi was building a Python rewrite of GoProX (fxstein/goprox-python). After compaction, when Oliver asked about unimplemented functions and Python 3.12 (should be 3.14), Pi had no memory of working on GoProX at all — and went searching in the wrong repo (pion-mcp) instead.

Root cause: Pi's memory architecture relies on free-text daily journals (memory/YYYY-MM-DD.md) and a curated long-term file (MEMORY.md). These capture what happened but not what I'm currently doing and what's next. Post-compaction, there's no structured anchor to resume from.

The Insight: ai-todo as Agent Memory

Oliver created ai-todo — an AI-native task management system originally built for Cursor AI agents. It enforces structured task tracking via a markdown TODO.md file managed through an MCP server or CLI.

The key realization: the same structured task tracking that keeps Cursor agents on-rail could solve Pi's memory drift problem on OpenClaw.

The pattern ai-todo enforces:

Plan before executing — Break work into numbered, discrete steps
Track state explicitly — [ ] pending, [x] done, #inprogress tag for active work
Check off as you go — Creates a clear resumption point after any interruption
Notes capture context — Decisions, discoveries, and blockers are attached to tasks
Git-tracked history — Every state change is committed, creating an audit trail

Why Markdown is the Right Choice

Oliver's deliberate design decision: TODO.md is plain markdown, not a database. This seems primitive but is actually the correct architecture for agent memory:

Survives everything — If the MCP server is down, the agent can still cat TODO.md and understand the full state
Human readable — Oliver can review what Pi is working on without any tooling
Git-native — Tracked alongside code, diffable, branchable
Token-efficient — Structured markdown is compact compared to free-text journals
No dependencies — No database, no API, no external service to fail

Analysis of ai-todo for Pi's Use Case

Strengths

Feature	Benefit for Agent Memory
Structured task decomposition	Post-compaction, Pi reads the checklist and knows exactly where to resume
`#inprogress` tag	Instant answer to "what was I doing?"
Analysis → Design → Implement → Test → Verify → Document workflow	Forces planning before execution (prevents the Chezmoi-style "jumped into execution" failures)
Tamper detection (checksums)	Catches if TODO.md is edited outside the tool — integrity guarantee
Archive + prune lifecycle	Completed work moves out of active view but remains recoverable
Notes per task	Decisions and context survive attached to the relevant task

Gaps for Pi's Use Case

1. No "working context" concept ai-todo tracks what to do and what's done, but not the richer context of where I am mid-task. When debugging and discovering "Python version is wrong," that insight needs to live somewhere more structured than a note.

Potential enhancement: A context or journal field per task, or a companion scratchpad linked to active tasks.

2. Single-repo scope ai-todo is designed as one TODO.md per repository. Pi's work spans multiple repos (GoProX, pion-mcp, pi-infra, pi-oliver). Need either a workspace-level meta-TODO or cross-repo awareness.

Potential enhancement: Support for a workspace-level TODO.md that can reference or aggregate repo-specific ones.

3. No time/duration tracking The start/stop commands exist but don't record elapsed time. For self-auditing ("you spent 2 minutes on it"), time data would be valuable.

Potential enhancement: Timestamps on start/stop transitions, with optional duration calculation.

4. Archive bloat in context window ai-todo's own TODO.md has 270+ tasks (mostly completed). For an agent that needs to read this on session start, that's a lot of tokens consumed by historical data.

Potential enhancement: A "compact view" mode that returns only active + recently completed tasks. The prune command helps but could be more aggressive by default for agent consumption.

5. No compaction awareness ai-todo doesn't know about LLM context windows or compaction events. A "checkpoint" concept — save minimum resumption state — triggered before compaction would be powerful.

Potential enhancement: A checkpoint command or tool that produces a minimal state summary optimized for post-compaction injection.

Proposed Integration Path

Phase 1: Adopt the Pattern (No Code Changes)

Pi starts using a structured TODO.md in the workspace for all multi-step work
Follow the ai-todo conventions: numbered tasks, subtasks, #inprogress tag, notes for context
Manual management (no MCP server needed yet)
Validate that the pattern actually prevents memory drift

Phase 2: Install ai-todo MCP Server

Add ai-todo to Pi's OpenClaw container (via uv tool install ai-todo)
Use the MCP tools for task management (if OpenClaw supports MCP tool integration)
Alternatively, use the CLI directly via exec

Phase 3: Enhance for Agent Memory

Based on Phase 1-2 learnings, potentially fork or extend ai-todo:

Add workspace-level multi-repo support
Add compaction-aware checkpointing
Add richer working context per task
Optimize token usage for large task histories
Contribute improvements back upstream

Phase 4: OpenClaw Integration

If the pattern proves valuable, propose OpenClaw enhancements:

Pre-compaction hook that triggers a checkpoint save
Post-compaction context injection from structured state files
Potentially a PR to OpenClaw with structured task awareness

The Bigger Picture

This isn't just about Pi's memory. Every LLM agent has this problem. The context window is the agent's working memory, and compaction is amnesia. Current mitigations (RAG, memory files, conversation summaries) are all free-text — they preserve information but not intent and state.

ai-todo's contribution is forcing structure: not "here's what happened" but "here's the plan, here's where I am in it, here's what's next." That structure is what makes resumption possible.

Oliver's experience with thousands of tasks in Cursor validates the pattern. The question is whether it can scale from a single-repo coding agent to a general-purpose personal assistant — and what enhancements that transition requires.

This document captures the conversation between Oliver and Pi on 2026-02-10 about leveraging ai-todo as a structured memory architecture for LLM agents.

fxstein/ai-todo-analysis.md

Select an option

No results found