Agent Kaizen — Continuous Improvement for AGENTS.md / CLAUDE.md

Purpose

This skill analyzes recent git history to find places where the project’s agent instructions file (AGENTS.md, CLAUDE.md, or equivalent) failed to prevent mistakes, caused confusion, or missed opportunities to guide the agent toward better outcomes. It then rewrites the file to close those gaps.

The name comes from the Japanese concept of kaizen (改善): continuous, incremental improvement.

When to Use

On a regular cadence (e.g., weekly, after every N commits)
After a painful debugging session or reverted commit
When onboarding a new agent workflow or tool
Whenever someone says “the agent keeps doing X wrong”

Inputs

Input	Required	Description
Agent instructions file	Yes	Path to `AGENTS.md`, `CLAUDE.md`, `.claude/instructions.md`, or equivalent
Git repository	Yes	Must have ≥5 recent commits to analyze (20 is ideal)
Number of commits	No	Default: 20. How far back to look.

Process

Step 1 — Locate and Read the Current Instructions File

Search for the agent instructions file. Common names and locations:

# Find candidate files (order of precedence)
for f in AGENTS.md CLAUDE.md .claude/instructions.md .github/copilot-instructions.md CONVENTIONS.md; do
  if [ -f "$f" ]; then
    echo "Found: $f"
  fi
done

Read the file thoroughly. Before doing anything else, build a mental model of:

Structure: How is it organized? What sections exist?
Priorities: What does it emphasize? What’s listed first?
Gaps: What categories of guidance are missing entirely?
Staleness: Do any instructions reference obsolete tools, patterns, or file paths?

Step 2 — Gather Git History

Pull the last N commits with full diffs and messages:

# Get commit log with stats
git log --oneline --stat -20

# For each commit, we'll want the full diff and message
git log -20 --format="===COMMIT %H===
Author: %an
Date: %ad
Subject: %s
Body: %b
===END MESSAGE===" --patch

If the repo is large and diffs are enormous, use a targeted approach:

# Just messages + file list (lighter weight first pass)
git log -20 --format="===COMMIT %h === %s" --name-status

Then selectively pull full diffs only for commits that look relevant.

Step 3 — Classify Each Commit

For each of the last 20 commits, classify it into one or more of these categories:

🔴 Agent Mistakes (High Signal)

Signs that the agent did something wrong that better instructions would have prevented:

Commit messages containing: “fix”, “revert”, “oops”, “undo”, “wrong”, “broken”, “actually”, “should have”
A commit immediately followed by a “fix” commit touching the same files
Reverted commits (git revert)
Commits that delete or substantially rewrite code added 1-3 commits ago

For each mistake, ask: Could clearer instructions have prevented this?

🟡 Style/Convention Drift (Medium Signal)

Signs the agent deviated from project conventions:

Inconsistent naming patterns across commits
Mixed formatting styles (tabs vs spaces, quote styles, etc.)
Test files placed in wrong directories
Imports organized differently than existing code
New files that don’t follow the project’s established patterns

For each drift, ask: Is this convention documented in the instructions? Is it documented clearly enough?

🟢 Successful Patterns (Positive Signal)

Commits where the agent did something well, especially if it was non-obvious:

Clean, well-structured commits
Correct handling of edge cases
Proper test coverage
Good file organization

For each success, ask: Is the pattern that made this successful encoded in the instructions, or did the agent get lucky?

⚪ Neutral / Not Applicable

Commits that don’t reveal anything about instruction quality:

Merge commits
Dependency updates
Human-authored commits (if identifiable)

Step 4 — Identify Improvements

Based on the classification, compile a list of specific improvements. Each improvement should fit one of these types:

A. Missing Instructions

Things the agent should have been told but wasn’t:

Undocumented conventions the agent violated
Project-specific patterns that aren’t written down
Error-prone areas that need explicit warnings
Environment or tooling quirks

B. Unclear Instructions

Things that ARE in the file but didn’t work:

Instructions that were too vague (“write clean code”)
Instructions that contradicted each other
Instructions buried so deep the agent likely ignored them
Instructions using ambiguous language

C. Ordering Problems

Instructions that are in the wrong place:

Critical rules buried at the bottom
Related instructions scattered across sections
Prerequisites listed after the things that depend on them
High-frequency guidance placed after rarely-needed guidance

D. Stale Instructions

Things that are no longer accurate:

References to renamed files, functions, or directories
Deprecated tools or workflows
Patterns that the codebase has moved away from
Version-specific instructions for outdated versions

E. Missing Examples

Places where a concrete example would have prevented a mistake:

“Do X, not Y” pairs showing correct vs incorrect
File path examples for where things belong
Code snippets showing the preferred pattern
Command examples for common workflows

Step 5 — Rewrite the Instructions File

Now rewrite the file. Follow these principles:

Ordering Principles (Most to Least Important)

Critical constraints first. Things that cause immediate breakage, data loss, or security issues. (“Never commit API keys.” “Always run migrations before seeding.”)
High-frequency guidance second. Things the agent will encounter on almost every task. (Code style, file organization, test patterns.)
Project architecture third. Mental model of the codebase, key abstractions, directory structure.
Workflow / process fourth. How to run tests, how to submit changes, CI expectations.
Edge cases and gotchas fifth. Things that only matter sometimes but cause real pain when missed.
Background context last. History, rationale, links to further reading.

Writing Principles

Be specific, not aspirational. ❌ “Write clean, maintainable code.” ✅ “Functions should be <40 lines. Extract helpers into lib/helpers/.”
Use concrete examples. Every non-obvious rule should have a “Do this / Not this” pair.
Front-load the important word. ❌ “When working with the database, always use parameterized queries.” ✅ “ALWAYS use parameterized queries for database access.”
Group related rules. Don’t scatter testing rules across 4 sections.
Mark severity. Use markers like CRITICAL:, PREFER:, AVOID:, NEVER: so the agent can triage.
Keep it scannable. The file will be read by an LLM with a context window — dense prose is fine, but structure matters.
Include the “why” briefly. A one-line rationale helps the agent generalize. (“Use pnpm not npm — the lockfile is pnpm-lock.yaml and npm will create conflicts.”)
Delete dead guidance. Instructions nobody follows (human or agent) create noise and erode trust in the file.

Structural Template

Use this as a starting skeleton, adapting sections to the project:

# [Project Name] — Agent Instructions

## Critical Rules
<!-- Things that MUST be followed. Violations break the build, lose data, or create security issues. -->

## Code Style & Conventions
<!-- Formatting, naming, file organization, import ordering, etc. -->

## Architecture Overview
<!-- Key abstractions, directory structure, data flow. Keep it brief. -->

## Common Tasks
<!-- Step-by-step for frequent operations: adding a feature, fixing a bug, adding a test. -->

## Testing
<!-- How to run tests, where test files go, coverage expectations, mocking patterns. -->

## Gotchas & Known Issues
<!-- Non-obvious things that cause real problems. -->

## Background & Rationale
<!-- Why the project is structured this way. Links to ADRs, design docs, etc. -->

Step 6 — Validate the Rewrite

Before finalizing, run these checks:

Diff the old and new file. Confirm every deletion was intentional.
Check for contradictions. Search for conflicting instructions (e.g., “always use X” in one place and “prefer Y over X” in another).

Verify file paths and names. Every path referenced in the instructions should actually exist:

# Extract paths from the instructions and verify they exist
grep -oE '`[^`]*\.(ts|js|py|rs|go|md|yaml|json|toml)`' AGENTS.md | tr -d '`' | while read -r p; do
  [ ! -e "$p" ] && echo "⚠ Referenced path does not exist: $p"
done

Check length. If the file exceeds ~3000 words, consider whether any sections can be moved to linked docs. Agent instruction files should be dense but not overwhelming.
Smoke test. Re-read the file from the perspective of an agent seeing this project for the first time. Does it give you enough to start working confidently?

Step 7 — Write a Changelog Entry

Add a brief summary of what changed and why, either as a commit message or in a changelog section at the bottom of the file:

<!-- Kaizen Log
- 2025-02-08: Reordered testing section above architecture (agents touch tests more often).
  Added explicit "never mock the database" rule after 3 commits required revert.
  Removed reference to deprecated /scripts/legacy_build.sh.
-->

This log is valuable for the next kaizen pass — it shows what was tried and why.

Output

The skill produces:

The rewritten instructions file — drop-in replacement at the original path
A summary of changes — what was changed and which commits motivated each change, formatted as a brief report

Example Findings → Fixes

Commit Signal	Finding	Fix Applied
`fix: use correct import path` followed by `fix: actually fix import`	Agent didn’t know the project uses path aliases	Added `CRITICAL: This project uses TypeScript path aliases. Import from @/lib/, never ../../../lib/`
Three commits all restructured the same component differently	No guidance on component file structure	Added `## React Components` section with canonical file layout example
Agent added tests in `__tests__/` but project uses `*.test.ts` co-location	Test location convention undocumented	Added `Tests live next to source files as *.test.ts, not in a __tests__/ directory`
`revert: undo database migration changes`	Agent modified migration files instead of creating new ones	Added `NEVER: Do not modify existing migration files. Always create a new migration.`
Agent used `console.log` for debugging in 4 of 20 commits	Logging convention unclear	Added `Use the project logger (import { log } from '@/lib/logger'), never console.log`
Commit adds feature but no test, next commit adds test as afterthought	Testing expectations not prominent enough	Moved testing section to position #2, added `Every feature commit MUST include tests` to Critical Rules

Tips for Best Results

Run this skill regularly. The value compounds. Each pass makes the next one smaller.
Don’t aim for perfection. A 10% better instructions file after each pass beats a rewrite you never do.
Include human commits too. If humans are also committing, their fixes to agent mistakes are extremely high signal — they show exactly what the agent got wrong.
Track improvements over time. If the same category of mistake keeps appearing across kaizen passes, the instructions aren’t the problem — the agent may need a different approach (pre-commit hooks, linting rules, etc.).
Keep the file honest. If a rule is consistently violated and nothing breaks, maybe it shouldn’t be a rule.

jkakar/SKILL.md

Select an option

No results found