Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save vshuraeff/adbcbb7d76dec2e1ec81fd5763c359e8 to your computer and use it in GitHub Desktop.

Select an option

Save vshuraeff/adbcbb7d76dec2e1ec81fd5763c359e8 to your computer and use it in GitHub Desktop.

Plan: Codex CLI as Global Review Agent for Claude Code

Context

Problem: Claude Code edits code without independent validation. A second AI reviewer (OpenAI Codex) can catch bugs, security issues, and style problems before they're committed — acting as an automated "second pair of eyes."

Goal: Integrate Codex CLI (v0.99.0, already installed) into every Claude Code session as a global review agent with three trigger points: task completion, pre-commit blocking, and on-demand /codex-review command.

Outcome: Claude's changes get automatically reviewed by Codex, and Claude fixes issues before they reach git history.


Lessons Learned (implementation errata)

These are the specific mistakes and inaccuracies discovered during the first implementation attempt. Every section below incorporates these fixes.

1. PreToolUse deny JSON format is nested, not flat

Wrong: {"decision":"deny","reason":"..."} Correct: The output must be wrapped in hookSpecificOutput:

{
  "hookSpecificOutput": {
    "hookEventName": "PreToolUse",
    "permissionDecision": "deny",
    "permissionDecisionReason": "your reason here"
  }
}

Exit code must be 0 when outputting this JSON — the decision is in the payload, not the exit code.

2. MCP server config lives in ~/.claude.json, not ~/.claude/settings.json

~/.claude/settings.json does NOT have an mcpServers field (schema validation rejects it). MCP servers are registered in ~/.claude.json under the top-level mcpServers key:

{
  "mcpServers": {
    "codex-review": {
      "type": "stdio",
      "command": "/absolute/path/to/codex-mcp-server.py",
      "args": [],
      "env": {}
    }
  }
}

3. codex exec review --uncommitted is mutually exclusive with [PROMPT]

Codex CLI rejects any positional prompt argument (including - for stdin) when --uncommitted is used:

error: the argument '--uncommitted' cannot be used with '[PROMPT]'

Consequence: --focus, --files, and --prompt wrapper flags have NO effect for uncommitted reviews. Custom prompts only work with --base <BRANCH> scope. The core script must silently skip prompt construction when scope == "uncommitted".

4. Codex JSONL event structure differs from assumed format

Wrong assumption: events are {"type":"message","role":"assistant","content":[{"type":"text","text":"..."}]} Actual format: Codex outputs these event types:

  • {"type":"thread.started","thread_id":"..."} — session start
  • {"type":"turn.started"} — turn boundary
  • {"type":"item.completed","item":{"id":"...","type":"reasoning","text":"..."}} — reasoning steps
  • {"type":"item.completed","item":{"id":"...","type":"command_execution","command":"...","aggregated_output":"...","exit_code":0,"status":"completed"}} — tool calls
  • {"type":"item.completed","item":{"id":"...","type":"agent_message","text":"..."}}the actual review text
  • {"type":"turn.completed","usage":{...}} — turn end

The review text is in: event.item.text where event.type == "item.completed" and event.item.type == "agent_message".

5. Pre-commit hook must check untracked files, not just diffs

git diff --cached --quiet && git diff --quiet only checks tracked files. Untracked files (the common case when Claude creates new files) are missed entirely. Must also check: git ls-files --others --exclude-standard.

6. Severity heuristic produces false positives

Keywords like "error", "bug" appear in benign Codex output (e.g., "SyntaxError" in command output, or "no bugs found"). The heuristic classifier should:

  • Run AFTER "no issues" pattern detection, not before
  • Use Codex's own priority markers ([P1], [P2], [P3], [P4]) when present, which are more reliable than keyword matching
  • Ignore keywords that appear inside quoted command output or code blocks

7. Hook stdin JSON has more fields than assumed

Assumed: {"tool_name":"Bash","tool_input":{"command":"..."},"cwd":"..."} Actual PreToolUse input:

{
  "session_id": "abc123",
  "transcript_path": "/path/to/transcript.jsonl",
  "cwd": "/current/working/dir",
  "permission_mode": "default",
  "hook_event_name": "PreToolUse",
  "tool_name": "Bash",
  "tool_input": {"command": "git commit -m \"test\""}
}

Actual TaskCompleted input:

{
  "session_id": "abc123",
  "transcript_path": "/path/to/transcript.jsonl",
  "cwd": "/current/working/dir",
  "permission_mode": "default",
  "hook_event_name": "TaskCompleted",
  "task_id": "task-001",
  "task_subject": "...",
  "task_description": "..."
}

The hook scripts work because they use jq with // empty fallback, but tests should use the full payload structure.

8. Pre-commit hook must only block on critical/high severity, not all issues

The original implementation blocked on ANY has_issues == true, regardless of severity. This caused low-severity findings (e.g., README wording, metadata-only changes) to block commits — clearly wrong for a development workflow. Both hooks (pre-commit AND task completion) must apply the same severity threshold: only block on critical or high. Medium and low findings should pass through.


Architecture

Trigger                    →  Core Script        →  Codex CLI                    →  Feedback Loop
───────────────────────────────────────────────────────────────────────────────────────────────────
PreToolUse (Bash matcher)  →  codex-review.py    →  codex exec review           →  hookSpecificOutput deny → Claude fixes → retry
                              (grep for git commit)  --uncommitted --json          (JSON on stdout, exit 0)
                                                     --ephemeral

TaskCompleted              →  codex-review.py    →  codex exec review           →  exit 2 + stderr → Claude sees feedback
                              (check git changes)    --uncommitted --json
                                                     --ephemeral

/codex-review skill        →  MCP codex_review   →  codex exec review           →  result in context → Claude fixes
                              (codex-mcp-server)     --uncommitted OR --base
                                                     --json --ephemeral

Components & Files

1. Core Review Script

File: ~/.claude/hooks/scripts/codex-review.py

  • Shebang: #!/usr/bin/env -S uv run --quiet
  • PEP 723 inline metadata: requires-python = ">=3.10", zero dependencies
  • Wraps codex exec review --json --ephemeral
  • Adds --uncommitted OR --base <branch> based on --scope flag
  • CRITICAL: When scope == "uncommitted", does NOT append any prompt arguments (mutually exclusive in Codex CLI)
  • When scope == "branch", may append focus/custom prompt as positional argument
  • Parses JSONL stdout: iterates events, extracts text from item.completed where item.type == "agent_message"
  • Severity classification order: (a) check for "no issues" patterns first, (b) look for Codex [P1]-[P4] markers, (c) fall back to keyword heuristic
  • Outputs structured JSON: {"has_issues": bool, "max_severity": str, "summary": str, "details": str, "error": str|null}
  • Accepts: --scope {uncommitted,branch}, --focus, --format {json,text}, --timeout, --blocking, --base, --files, --prompt, --min-severity
  • All error paths return valid JSON with has_issues: false — never crashes, never blocks on failure

2. Pre-Commit Hook (blocking)

File: ~/.claude/hooks/codex-pre-commit.sh

  • Event: PreToolUse with matcher: "Bash"
  • Reads full hook input from stdin (JSON with session_id, cwd, tool_name, tool_input, etc.)
  • Extracts tool_input.command via jq
  • Only triggers when command matches regex (^|\s|&&|\|)git\s+commit(\s|$)
  • Checks for changes: staged (git diff --cached), unstaged (git diff), AND untracked (git ls-files --others --exclude-standard)
  • Runs codex-review.py --scope uncommitted --format json --timeout 120
  • Only blocks on critical or high severity — medium/low pass through (same threshold as task completion hook)
  • If critical/high issues found: outputs the correct nested deny JSON on stdout and exits 0:
    {
      "hookSpecificOutput": {
        "hookEventName": "PreToolUse",
        "permissionDecision": "deny",
        "permissionDecisionReason": "Codex review found issues..."
      }
    }
  • If medium/low, clean, or Codex fails: exit 0 with no stdout (allow)

3. Task Completion Hook (blocking on critical/high)

File: ~/.claude/hooks/codex-task-review.sh

  • Event: TaskCompleted (no matcher support — always fires)
  • Reads stdin, extracts cwd
  • Checks: git repo? + has changes (staged/unstaged/untracked)?
  • Runs codex-review.py --scope uncommitted --format json --timeout 120
  • NOTE: --focus flag is NOT passed because it's ignored for uncommitted scope anyway
  • If critical/high severity: writes review to stderr, exit 2 — Claude receives feedback
  • If medium/low or clean: exit 0

4. MCP Server

File: ~/.claude/hooks/scripts/codex-mcp-server.py

  • Shebang: #!/usr/bin/env -S uv run --quiet
  • PEP 723: dependencies = ["mcp>=1.0"]
  • Uses FastMCP from mcp.server.fastmcp
  • Exposes codex_review tool with params: scope, focus, base_branch, files, custom_prompt, timeout
  • Delegates to codex-review.py subprocess
  • Returns JSON string result to Claude's context
  • Registered in: ~/.claude.json under mcpServers (NOT in settings.json)

5. /codex-review Skill

File: ~/.claude/commands/codex-review.md

  • Usage: /codex-review [scope] [--focus areas]
  • Instructs Claude to use MCP codex_review tool (or fallback to direct Bash)
  • Supports auto-fix loop: review → fix → re-review (max 3 iterations)

6. Settings & Configuration (TWO files)

~/.claude/settings.json — hooks only:

  • Add hooks.PreToolUse array with matcher "Bash" → command hook
  • Add hooks.TaskCompleted array → command hook
  • Each hook entry requires: type: "command", command: "/absolute/path/...", optionally timeout and statusMessage
  • Use absolute paths (not ~), because hooks may not expand tilde

~/.claude.json — MCP server only:

  • Add mcpServers.codex-review with type: "stdio", command: "/absolute/path/...", args: [], env: {}

Implementation Steps

Phase 1: Core Script

  1. Create directory: mkdir -p ~/.claude/hooks/scripts/
  2. Write codex-review.py with these critical behaviors:
    • Parse JSONL: extract text from item.completed events where item.type == "agent_message"
    • Skip prompt args when scope == "uncommitted" (Codex CLI constraint)
    • Severity: check "no issues" patterns first, then [P1]-[P4] markers, then keyword heuristic
  3. chmod +x ~/.claude/hooks/scripts/codex-review.py
  4. Test in a git repo with a known-bad file:
    cd /tmp && mkdir test-repo && cd test-repo && git init
    echo "buggy code" > bug.py && git add . && git commit -m "init"
    echo "more bugs" >> bug.py
    ~/.claude/hooks/scripts/codex-review.py --scope uncommitted --format json
    Expected: JSON with has_issues: true, extracted review text in details

Phase 2: Hook Scripts

  1. Write codex-pre-commit.sh with correct deny JSON format (nested hookSpecificOutput)
  2. Write codex-task-review.sh — exit 2 with stderr on critical/high, NO --focus flag
  3. chmod +x both scripts
  4. Test pre-commit hook (must use git repo with actual changes):
    cd /tmp/test-repo
    echo '{"session_id":"test","cwd":"/tmp/test-repo","hook_event_name":"PreToolUse","tool_name":"Bash","tool_input":{"command":"git commit -m test"}}' \
      | /path/to/codex-pre-commit.sh
    Expected: JSON with hookSpecificOutput.permissionDecision: "deny" on stdout
  5. Test non-commit commands pass through:
    echo '{"tool_name":"Bash","tool_input":{"command":"ls -la"},"cwd":"/tmp"}' | /path/to/codex-pre-commit.sh
    Expected: no stdout, exit 0
  6. Test task completion hook:
    bash -c 'echo "{\"cwd\":\"/tmp/test-repo\"}" | /path/to/codex-task-review.sh >/dev/null 2>&1; echo "EXIT=$?"'
    Expected: EXIT=2 if issues found, EXIT=0 if clean

Phase 3: Configuration

  1. Edit ~/.claude/settings.json — add hooks key with PreToolUse and TaskCompleted arrays
    • Use absolute paths in command field, NOT tilde-expanded paths
    • Set timeout: 180 (Codex inference can take 30-90s)
  2. Edit ~/.claude.json — add mcpServers.codex-review entry
    • Script: use python3 -c "import json; ..." to safely merge into existing JSON without destroying other fields

Phase 4: MCP Server

  1. Write codex-mcp-server.py
  2. chmod +x
  3. Test MCP server starts without import errors:
    timeout 10 /path/to/codex-mcp-server.py < /dev/null; echo "EXIT=$?"
    Expected: EXIT=0 (clean exit on EOF)

Phase 5: Skill

  1. Write ~/.claude/commands/codex-review.md

Phase 6: Integration Testing

  1. Create a fresh test repo, introduce a deliberate bug (invalid Python syntax)
  2. Test core script produces correct JSON with has_issues: true
  3. Test pre-commit hook outputs correct deny JSON format
  4. Test task completion hook returns exit code 2 on high severity
  5. Test with clean code — verify both hooks allow (exit 0, no deny output)
  6. Restart Claude Code session to pick up new hooks and MCP server
  7. In live session: make a buggy change, attempt commit → should be blocked
  8. In live session: run /codex-review → verify review output appears in context

Codex CLI Reference (v0.99.0)

codex exec review [OPTIONS] [PROMPT]

  --uncommitted     Review staged, unstaged, and untracked changes
                    MUTUALLY EXCLUSIVE with [PROMPT] — cannot pass both
  --base <BRANCH>   Review changes against base branch (can combine with [PROMPT])
  --commit <SHA>    Review changes from a specific commit
  --json            Output JSONL events to stdout
  --ephemeral       Don't persist session files
  --full-auto       Run with automatic sandboxed execution
  -m, --model       Override model
  --title <TITLE>   Commit title for review summary

JSONL Event Types (--json output)

thread.started      → {"type":"thread.started","thread_id":"..."}
turn.started        → {"type":"turn.started"}
item.completed      → {"type":"item.completed","item":{"type":"reasoning|command_execution|agent_message",...}}
turn.completed      → {"type":"turn.completed","usage":{...}}

Review text is in events where item.type == "agent_message"item.text Codex priority markers: [P1] (critical), [P2] (high), [P3] (medium), [P4] (low/style)


Claude Code Hooks Reference

PreToolUse Hook

stdin: {"session_id","transcript_path","cwd","permission_mode","hook_event_name":"PreToolUse","tool_name","tool_input"} stdout to deny:

{
  "hookSpecificOutput": {
    "hookEventName": "PreToolUse",
    "permissionDecision": "deny",
    "permissionDecisionReason": "reason string"
  }
}

Exit code: always 0 (decision is in JSON, not exit code)

TaskCompleted Hook

stdin: {"session_id","transcript_path","cwd","permission_mode","hook_event_name":"TaskCompleted","task_id","task_subject","task_description"} Block: exit 2 + stderr message (fed back to Claude) Allow: exit 0

Settings Schema (hooks portion of ~/.claude/settings.json)

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {"type": "command", "command": "/absolute/path/to/script.sh", "timeout": 180, "statusMessage": "..."}
        ]
      }
    ],
    "TaskCompleted": [
      {
        "hooks": [
          {"type": "command", "command": "/absolute/path/to/script.sh", "timeout": 180, "statusMessage": "..."}
        ]
      }
    ]
  }
}

Key Design Decisions

Decision Choice Rationale
Core script deps Zero (stdlib only) Fast hook startup, no uv resolution delay
Pre-commit blocking Nested JSON hookSpecificOutput.permissionDecision: deny, only critical/high Don't block commits on minor style/metadata issues
Task completion blocking Only critical/high severity via exit 2 Same threshold as pre-commit — consistency
Severity threshold Both hooks: critical + high only Medium/low findings pass through; blocking on low breaks workflow for benign changes like README edits
Graceful degradation Always exit 0 on errors (hooks), always return valid JSON (script) Never break the development workflow
Codex flags --json --ephemeral always; --uncommitted for working tree Structured output, no session clutter
No prompt with --uncommitted Skip --focus/--prompt args for uncommitted scope Codex CLI rejects --uncommitted + [PROMPT]
Config split Hooks in settings.json, MCP in ~/.claude.json Schema validation requires this separation
Severity classification Codex [P1]-[P4] markers > "no issues" patterns > keyword heuristic Codex's own priority markers are more reliable than keyword matching
Timeout 180s for hooks, 120s for script Codex inference takes 30-90s; hook timeout must exceed script timeout
Absolute paths in hooks Use /Users/<user>/... not ~/... Hook commands may not expand tilde

File Inventory

File Type Config Location
~/.claude/hooks/scripts/codex-review.py Core script Referenced by hooks and MCP server
~/.claude/hooks/codex-pre-commit.sh Shell hook ~/.claude/settings.jsonhooks.PreToolUse
~/.claude/hooks/codex-task-review.sh Shell hook ~/.claude/settings.jsonhooks.TaskCompleted
~/.claude/hooks/scripts/codex-mcp-server.py MCP server ~/.claude.jsonmcpServers.codex-review
~/.claude/commands/codex-review.md Skill Auto-discovered by Claude Code
~/.claude/settings.json Config Hooks registration
~/.claude.json Config MCP server registration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment