Skip to content

Instantly share code, notes, and snippets.

@jkakar
Created February 8, 2026 23:00
Show Gist options
  • Select an option

  • Save jkakar/daaf1b28cb212d11645062f97bc84930 to your computer and use it in GitHub Desktop.

Select an option

Save jkakar/daaf1b28cb212d11645062f97bc84930 to your computer and use it in GitHub Desktop.
Agent kaizen skill

Agent Kaizen — Continuous Improvement for AGENTS.md / CLAUDE.md

Purpose

This skill analyzes recent git history to find places where the project’s agent instructions file (AGENTS.md, CLAUDE.md, or equivalent) failed to prevent mistakes, caused confusion, or missed opportunities to guide the agent toward better outcomes. It then rewrites the file to close those gaps.

The name comes from the Japanese concept of kaizen (改善): continuous, incremental improvement.


When to Use

  • On a regular cadence (e.g., weekly, after every N commits)
  • After a painful debugging session or reverted commit
  • When onboarding a new agent workflow or tool
  • Whenever someone says “the agent keeps doing X wrong”

Inputs

Input Required Description
Agent instructions file Yes Path to AGENTS.md, CLAUDE.md, .claude/instructions.md, or equivalent
Git repository Yes Must have ≥5 recent commits to analyze (20 is ideal)
Number of commits No Default: 20. How far back to look.

Process

Step 1 — Locate and Read the Current Instructions File

Search for the agent instructions file. Common names and locations:

# Find candidate files (order of precedence)
for f in AGENTS.md CLAUDE.md .claude/instructions.md .github/copilot-instructions.md CONVENTIONS.md; do
  if [ -f "$f" ]; then
    echo "Found: $f"
  fi
done

Read the file thoroughly. Before doing anything else, build a mental model of:

  1. Structure: How is it organized? What sections exist?
  2. Priorities: What does it emphasize? What’s listed first?
  3. Gaps: What categories of guidance are missing entirely?
  4. Staleness: Do any instructions reference obsolete tools, patterns, or file paths?

Step 2 — Gather Git History

Pull the last N commits with full diffs and messages:

# Get commit log with stats
git log --oneline --stat -20

# For each commit, we'll want the full diff and message
git log -20 --format="===COMMIT %H===
Author: %an
Date: %ad
Subject: %s
Body: %b
===END MESSAGE===" --patch

If the repo is large and diffs are enormous, use a targeted approach:

# Just messages + file list (lighter weight first pass)
git log -20 --format="===COMMIT %h === %s" --name-status

Then selectively pull full diffs only for commits that look relevant.

Step 3 — Classify Each Commit

For each of the last 20 commits, classify it into one or more of these categories:

🔴 Agent Mistakes (High Signal)

Signs that the agent did something wrong that better instructions would have prevented:

  • Commit messages containing: “fix”, “revert”, “oops”, “undo”, “wrong”, “broken”, “actually”, “should have”
  • A commit immediately followed by a “fix” commit touching the same files
  • Reverted commits (git revert)
  • Commits that delete or substantially rewrite code added 1-3 commits ago

For each mistake, ask: Could clearer instructions have prevented this?

🟡 Style/Convention Drift (Medium Signal)

Signs the agent deviated from project conventions:

  • Inconsistent naming patterns across commits
  • Mixed formatting styles (tabs vs spaces, quote styles, etc.)
  • Test files placed in wrong directories
  • Imports organized differently than existing code
  • New files that don’t follow the project’s established patterns

For each drift, ask: Is this convention documented in the instructions? Is it documented clearly enough?

🟢 Successful Patterns (Positive Signal)

Commits where the agent did something well, especially if it was non-obvious:

  • Clean, well-structured commits
  • Correct handling of edge cases
  • Proper test coverage
  • Good file organization

For each success, ask: Is the pattern that made this successful encoded in the instructions, or did the agent get lucky?

⚪ Neutral / Not Applicable

Commits that don’t reveal anything about instruction quality:

  • Merge commits
  • Dependency updates
  • Human-authored commits (if identifiable)

Step 4 — Identify Improvements

Based on the classification, compile a list of specific improvements. Each improvement should fit one of these types:

A. Missing Instructions

Things the agent should have been told but wasn’t:

  • Undocumented conventions the agent violated
  • Project-specific patterns that aren’t written down
  • Error-prone areas that need explicit warnings
  • Environment or tooling quirks

B. Unclear Instructions

Things that ARE in the file but didn’t work:

  • Instructions that were too vague (“write clean code”)
  • Instructions that contradicted each other
  • Instructions buried so deep the agent likely ignored them
  • Instructions using ambiguous language

C. Ordering Problems

Instructions that are in the wrong place:

  • Critical rules buried at the bottom
  • Related instructions scattered across sections
  • Prerequisites listed after the things that depend on them
  • High-frequency guidance placed after rarely-needed guidance

D. Stale Instructions

Things that are no longer accurate:

  • References to renamed files, functions, or directories
  • Deprecated tools or workflows
  • Patterns that the codebase has moved away from
  • Version-specific instructions for outdated versions

E. Missing Examples

Places where a concrete example would have prevented a mistake:

  • “Do X, not Y” pairs showing correct vs incorrect
  • File path examples for where things belong
  • Code snippets showing the preferred pattern
  • Command examples for common workflows

Step 5 — Rewrite the Instructions File

Now rewrite the file. Follow these principles:

Ordering Principles (Most to Least Important)

  1. Critical constraints first. Things that cause immediate breakage, data loss, or security issues. (“Never commit API keys.” “Always run migrations before seeding.”)
  2. High-frequency guidance second. Things the agent will encounter on almost every task. (Code style, file organization, test patterns.)
  3. Project architecture third. Mental model of the codebase, key abstractions, directory structure.
  4. Workflow / process fourth. How to run tests, how to submit changes, CI expectations.
  5. Edge cases and gotchas fifth. Things that only matter sometimes but cause real pain when missed.
  6. Background context last. History, rationale, links to further reading.

Writing Principles

  • Be specific, not aspirational. ❌ “Write clean, maintainable code.” ✅ “Functions should be <40 lines. Extract helpers into lib/helpers/.”
  • Use concrete examples. Every non-obvious rule should have a “Do this / Not this” pair.
  • Front-load the important word. ❌ “When working with the database, always use parameterized queries.” ✅ “ALWAYS use parameterized queries for database access.”
  • Group related rules. Don’t scatter testing rules across 4 sections.
  • Mark severity. Use markers like CRITICAL:, PREFER:, AVOID:, NEVER: so the agent can triage.
  • Keep it scannable. The file will be read by an LLM with a context window — dense prose is fine, but structure matters.
  • Include the “why” briefly. A one-line rationale helps the agent generalize. (“Use pnpm not npm — the lockfile is pnpm-lock.yaml and npm will create conflicts.”)
  • Delete dead guidance. Instructions nobody follows (human or agent) create noise and erode trust in the file.

Structural Template

Use this as a starting skeleton, adapting sections to the project:

# [Project Name] — Agent Instructions

## Critical Rules
<!-- Things that MUST be followed. Violations break the build, lose data, or create security issues. -->

## Code Style & Conventions
<!-- Formatting, naming, file organization, import ordering, etc. -->

## Architecture Overview
<!-- Key abstractions, directory structure, data flow. Keep it brief. -->

## Common Tasks
<!-- Step-by-step for frequent operations: adding a feature, fixing a bug, adding a test. -->

## Testing
<!-- How to run tests, where test files go, coverage expectations, mocking patterns. -->

## Gotchas & Known Issues
<!-- Non-obvious things that cause real problems. -->

## Background & Rationale
<!-- Why the project is structured this way. Links to ADRs, design docs, etc. -->

Step 6 — Validate the Rewrite

Before finalizing, run these checks:

  1. Diff the old and new file. Confirm every deletion was intentional.

  2. Check for contradictions. Search for conflicting instructions (e.g., “always use X” in one place and “prefer Y over X” in another).

  3. Verify file paths and names. Every path referenced in the instructions should actually exist:

    # Extract paths from the instructions and verify they exist
    grep -oE '`[^`]*\.(ts|js|py|rs|go|md|yaml|json|toml)`' AGENTS.md | tr -d '`' | while read -r p; do
      [ ! -e "$p" ] && echo "⚠ Referenced path does not exist: $p"
    done
  4. Check length. If the file exceeds ~3000 words, consider whether any sections can be moved to linked docs. Agent instruction files should be dense but not overwhelming.

  5. Smoke test. Re-read the file from the perspective of an agent seeing this project for the first time. Does it give you enough to start working confidently?

Step 7 — Write a Changelog Entry

Add a brief summary of what changed and why, either as a commit message or in a changelog section at the bottom of the file:

<!-- Kaizen Log
- 2025-02-08: Reordered testing section above architecture (agents touch tests more often).
  Added explicit "never mock the database" rule after 3 commits required revert.
  Removed reference to deprecated /scripts/legacy_build.sh.
-->

This log is valuable for the next kaizen pass — it shows what was tried and why.


Output

The skill produces:

  1. The rewritten instructions file — drop-in replacement at the original path
  2. A summary of changes — what was changed and which commits motivated each change, formatted as a brief report

Example Findings → Fixes

Commit Signal Finding Fix Applied
fix: use correct import path followed by fix: actually fix import Agent didn’t know the project uses path aliases Added CRITICAL: This project uses TypeScript path aliases. Import from @/lib/*, never ../../../lib/*
Three commits all restructured the same component differently No guidance on component file structure Added ## React Components section with canonical file layout example
Agent added tests in __tests__/ but project uses *.test.ts co-location Test location convention undocumented Added Tests live next to source files as *.test.ts, not in a __tests__/ directory
revert: undo database migration changes Agent modified migration files instead of creating new ones Added NEVER: Do not modify existing migration files. Always create a new migration.
Agent used console.log for debugging in 4 of 20 commits Logging convention unclear Added Use the project logger (import { log } from '@/lib/logger'), never console.log
Commit adds feature but no test, next commit adds test as afterthought Testing expectations not prominent enough Moved testing section to position #2, added Every feature commit MUST include tests to Critical Rules

Tips for Best Results

  • Run this skill regularly. The value compounds. Each pass makes the next one smaller.
  • Don’t aim for perfection. A 10% better instructions file after each pass beats a rewrite you never do.
  • Include human commits too. If humans are also committing, their fixes to agent mistakes are extremely high signal — they show exactly what the agent got wrong.
  • Track improvements over time. If the same category of mistake keeps appearing across kaizen passes, the instructions aren’t the problem — the agent may need a different approach (pre-commit hooks, linting rules, etc.).
  • Keep the file honest. If a rule is consistently violated and nothing breaks, maybe it shouldn’t be a rule.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment