This skill analyzes recent git history to find places where the project’s agent instructions file (AGENTS.md, CLAUDE.md, or equivalent) failed to prevent mistakes, caused confusion, or missed opportunities to guide the agent toward better outcomes. It then rewrites the file to close those gaps.
The name comes from the Japanese concept of kaizen (改善): continuous, incremental improvement.
- On a regular cadence (e.g., weekly, after every N commits)
- After a painful debugging session or reverted commit
- When onboarding a new agent workflow or tool
- Whenever someone says “the agent keeps doing X wrong”
| Input | Required | Description |
|---|---|---|
| Agent instructions file | Yes | Path to AGENTS.md, CLAUDE.md, .claude/instructions.md, or equivalent |
| Git repository | Yes | Must have ≥5 recent commits to analyze (20 is ideal) |
| Number of commits | No | Default: 20. How far back to look. |
Search for the agent instructions file. Common names and locations:
# Find candidate files (order of precedence)
for f in AGENTS.md CLAUDE.md .claude/instructions.md .github/copilot-instructions.md CONVENTIONS.md; do
if [ -f "$f" ]; then
echo "Found: $f"
fi
doneRead the file thoroughly. Before doing anything else, build a mental model of:
- Structure: How is it organized? What sections exist?
- Priorities: What does it emphasize? What’s listed first?
- Gaps: What categories of guidance are missing entirely?
- Staleness: Do any instructions reference obsolete tools, patterns, or file paths?
Pull the last N commits with full diffs and messages:
# Get commit log with stats
git log --oneline --stat -20
# For each commit, we'll want the full diff and message
git log -20 --format="===COMMIT %H===
Author: %an
Date: %ad
Subject: %s
Body: %b
===END MESSAGE===" --patchIf the repo is large and diffs are enormous, use a targeted approach:
# Just messages + file list (lighter weight first pass)
git log -20 --format="===COMMIT %h === %s" --name-statusThen selectively pull full diffs only for commits that look relevant.
For each of the last 20 commits, classify it into one or more of these categories:
Signs that the agent did something wrong that better instructions would have prevented:
- Commit messages containing: “fix”, “revert”, “oops”, “undo”, “wrong”, “broken”, “actually”, “should have”
- A commit immediately followed by a “fix” commit touching the same files
- Reverted commits (
git revert) - Commits that delete or substantially rewrite code added 1-3 commits ago
For each mistake, ask: Could clearer instructions have prevented this?
Signs the agent deviated from project conventions:
- Inconsistent naming patterns across commits
- Mixed formatting styles (tabs vs spaces, quote styles, etc.)
- Test files placed in wrong directories
- Imports organized differently than existing code
- New files that don’t follow the project’s established patterns
For each drift, ask: Is this convention documented in the instructions? Is it documented clearly enough?
Commits where the agent did something well, especially if it was non-obvious:
- Clean, well-structured commits
- Correct handling of edge cases
- Proper test coverage
- Good file organization
For each success, ask: Is the pattern that made this successful encoded in the instructions, or did the agent get lucky?
Commits that don’t reveal anything about instruction quality:
- Merge commits
- Dependency updates
- Human-authored commits (if identifiable)
Based on the classification, compile a list of specific improvements. Each improvement should fit one of these types:
Things the agent should have been told but wasn’t:
- Undocumented conventions the agent violated
- Project-specific patterns that aren’t written down
- Error-prone areas that need explicit warnings
- Environment or tooling quirks
Things that ARE in the file but didn’t work:
- Instructions that were too vague (“write clean code”)
- Instructions that contradicted each other
- Instructions buried so deep the agent likely ignored them
- Instructions using ambiguous language
Instructions that are in the wrong place:
- Critical rules buried at the bottom
- Related instructions scattered across sections
- Prerequisites listed after the things that depend on them
- High-frequency guidance placed after rarely-needed guidance
Things that are no longer accurate:
- References to renamed files, functions, or directories
- Deprecated tools or workflows
- Patterns that the codebase has moved away from
- Version-specific instructions for outdated versions
Places where a concrete example would have prevented a mistake:
- “Do X, not Y” pairs showing correct vs incorrect
- File path examples for where things belong
- Code snippets showing the preferred pattern
- Command examples for common workflows
Now rewrite the file. Follow these principles:
- Critical constraints first. Things that cause immediate breakage, data loss, or security issues. (“Never commit API keys.” “Always run migrations before seeding.”)
- High-frequency guidance second. Things the agent will encounter on almost every task. (Code style, file organization, test patterns.)
- Project architecture third. Mental model of the codebase, key abstractions, directory structure.
- Workflow / process fourth. How to run tests, how to submit changes, CI expectations.
- Edge cases and gotchas fifth. Things that only matter sometimes but cause real pain when missed.
- Background context last. History, rationale, links to further reading.
- Be specific, not aspirational. ❌ “Write clean, maintainable code.” ✅ “Functions should be <40 lines. Extract helpers into
lib/helpers/.” - Use concrete examples. Every non-obvious rule should have a “Do this / Not this” pair.
- Front-load the important word. ❌ “When working with the database, always use parameterized queries.” ✅ “ALWAYS use parameterized queries for database access.”
- Group related rules. Don’t scatter testing rules across 4 sections.
- Mark severity. Use markers like
CRITICAL:,PREFER:,AVOID:,NEVER:so the agent can triage. - Keep it scannable. The file will be read by an LLM with a context window — dense prose is fine, but structure matters.
- Include the “why” briefly. A one-line rationale helps the agent generalize. (“Use
pnpmnotnpm— the lockfile is pnpm-lock.yaml and npm will create conflicts.”) - Delete dead guidance. Instructions nobody follows (human or agent) create noise and erode trust in the file.
Use this as a starting skeleton, adapting sections to the project:
# [Project Name] — Agent Instructions
## Critical Rules
<!-- Things that MUST be followed. Violations break the build, lose data, or create security issues. -->
## Code Style & Conventions
<!-- Formatting, naming, file organization, import ordering, etc. -->
## Architecture Overview
<!-- Key abstractions, directory structure, data flow. Keep it brief. -->
## Common Tasks
<!-- Step-by-step for frequent operations: adding a feature, fixing a bug, adding a test. -->
## Testing
<!-- How to run tests, where test files go, coverage expectations, mocking patterns. -->
## Gotchas & Known Issues
<!-- Non-obvious things that cause real problems. -->
## Background & Rationale
<!-- Why the project is structured this way. Links to ADRs, design docs, etc. -->Before finalizing, run these checks:
-
Diff the old and new file. Confirm every deletion was intentional.
-
Check for contradictions. Search for conflicting instructions (e.g., “always use X” in one place and “prefer Y over X” in another).
-
Verify file paths and names. Every path referenced in the instructions should actually exist:
# Extract paths from the instructions and verify they exist grep -oE '`[^`]*\.(ts|js|py|rs|go|md|yaml|json|toml)`' AGENTS.md | tr -d '`' | while read -r p; do [ ! -e "$p" ] && echo "⚠ Referenced path does not exist: $p" done
-
Check length. If the file exceeds ~3000 words, consider whether any sections can be moved to linked docs. Agent instruction files should be dense but not overwhelming.
-
Smoke test. Re-read the file from the perspective of an agent seeing this project for the first time. Does it give you enough to start working confidently?
Add a brief summary of what changed and why, either as a commit message or in a changelog section at the bottom of the file:
<!-- Kaizen Log
- 2025-02-08: Reordered testing section above architecture (agents touch tests more often).
Added explicit "never mock the database" rule after 3 commits required revert.
Removed reference to deprecated /scripts/legacy_build.sh.
-->This log is valuable for the next kaizen pass — it shows what was tried and why.
The skill produces:
- The rewritten instructions file — drop-in replacement at the original path
- A summary of changes — what was changed and which commits motivated each change, formatted as a brief report
| Commit Signal | Finding | Fix Applied |
|---|---|---|
fix: use correct import path followed by fix: actually fix import |
Agent didn’t know the project uses path aliases | Added CRITICAL: This project uses TypeScript path aliases. Import from @/lib/*, never ../../../lib/* |
| Three commits all restructured the same component differently | No guidance on component file structure | Added ## React Components section with canonical file layout example |
Agent added tests in __tests__/ but project uses *.test.ts co-location |
Test location convention undocumented | Added Tests live next to source files as *.test.ts, not in a __tests__/ directory |
revert: undo database migration changes |
Agent modified migration files instead of creating new ones | Added NEVER: Do not modify existing migration files. Always create a new migration. |
Agent used console.log for debugging in 4 of 20 commits |
Logging convention unclear | Added Use the project logger (import { log } from '@/lib/logger'), never console.log |
| Commit adds feature but no test, next commit adds test as afterthought | Testing expectations not prominent enough | Moved testing section to position #2, added Every feature commit MUST include tests to Critical Rules |
- Run this skill regularly. The value compounds. Each pass makes the next one smaller.
- Don’t aim for perfection. A 10% better instructions file after each pass beats a rewrite you never do.
- Include human commits too. If humans are also committing, their fixes to agent mistakes are extremely high signal — they show exactly what the agent got wrong.
- Track improvements over time. If the same category of mistake keeps appearing across kaizen passes, the instructions aren’t the problem — the agent may need a different approach (pre-commit hooks, linting rules, etc.).
- Keep the file honest. If a rule is consistently violated and nothing breaks, maybe it shouldn’t be a rule.