Plan: Codex CLI as Global Review Agent for Claude Code

Context

Problem: Claude Code edits code without independent validation. A second AI reviewer (OpenAI Codex) can catch bugs, security issues, and style problems before they're committed — acting as an automated "second pair of eyes."

Goal: Integrate Codex CLI (v0.99.0, already installed) into every Claude Code session as a global review agent with three trigger points: task completion, pre-commit blocking, and on-demand /codex-review command.

Outcome: Claude's changes get automatically reviewed by Codex, and Claude fixes issues before they reach git history.

Lessons Learned (implementation errata)

These are the specific mistakes and inaccuracies discovered during the first implementation attempt. Every section below incorporates these fixes.

1. PreToolUse deny JSON format is nested, not flat

Wrong: {"decision":"deny","reason":"..."} Correct: The output must be wrapped in hookSpecificOutput:

{
  "hookSpecificOutput": {
    "hookEventName": "PreToolUse",
    "permissionDecision": "deny",
    "permissionDecisionReason": "your reason here"
  }
}

Exit code must be 0 when outputting this JSON — the decision is in the payload, not the exit code.

2. MCP server config lives in `~/.claude.json`, not `~/.claude/settings.json`

~/.claude/settings.json does NOT have an mcpServers field (schema validation rejects it). MCP servers are registered in ~/.claude.json under the top-level mcpServers key:

{
  "mcpServers": {
    "codex-review": {
      "type": "stdio",
      "command": "/absolute/path/to/codex-mcp-server.py",
      "args": [],
      "env": {}
    }
  }
}

3. `codex exec review --uncommitted` is mutually exclusive with `[PROMPT]`

Codex CLI rejects any positional prompt argument (including - for stdin) when --uncommitted is used:

error: the argument '--uncommitted' cannot be used with '[PROMPT]'

Consequence: --focus, --files, and --prompt wrapper flags have NO effect for uncommitted reviews. Custom prompts only work with --base <BRANCH> scope. The core script must silently skip prompt construction when scope == "uncommitted".

4. Codex JSONL event structure differs from assumed format

Wrong assumption: events are {"type":"message","role":"assistant","content":[{"type":"text","text":"..."}]} Actual format: Codex outputs these event types:

{"type":"thread.started","thread_id":"..."} — session start
{"type":"turn.started"} — turn boundary
{"type":"item.completed","item":{"id":"...","type":"reasoning","text":"..."}} — reasoning steps
{"type":"item.completed","item":{"id":"...","type":"command_execution","command":"...","aggregated_output":"...","exit_code":0,"status":"completed"}} — tool calls
{"type":"item.completed","item":{"id":"...","type":"agent_message","text":"..."}} — the actual review text
{"type":"turn.completed","usage":{...}} — turn end

The review text is in: event.item.text where event.type == "item.completed" and event.item.type == "agent_message".

5. Pre-commit hook must check untracked files, not just diffs

git diff --cached --quiet && git diff --quiet only checks tracked files. Untracked files (the common case when Claude creates new files) are missed entirely. Must also check: git ls-files --others --exclude-standard.

6. Severity heuristic produces false positives

Keywords like "error", "bug" appear in benign Codex output (e.g., "SyntaxError" in command output, or "no bugs found"). The heuristic classifier should:

Run AFTER "no issues" pattern detection, not before
Use Codex's own priority markers ([P1], [P2], [P3], [P4]) when present, which are more reliable than keyword matching
Ignore keywords that appear inside quoted command output or code blocks

7. Hook stdin JSON has more fields than assumed

Assumed: {"tool_name":"Bash","tool_input":{"command":"..."},"cwd":"..."} Actual PreToolUse input:

{
  "session_id": "abc123",
  "transcript_path": "/path/to/transcript.jsonl",
  "cwd": "/current/working/dir",
  "permission_mode": "default",
  "hook_event_name": "PreToolUse",
  "tool_name": "Bash",
  "tool_input": {"command": "git commit -m \"test\""}
}

Actual TaskCompleted input:

{
  "session_id": "abc123",
  "transcript_path": "/path/to/transcript.jsonl",
  "cwd": "/current/working/dir",
  "permission_mode": "default",
  "hook_event_name": "TaskCompleted",
  "task_id": "task-001",
  "task_subject": "...",
  "task_description": "..."
}

The hook scripts work because they use jq with // empty fallback, but tests should use the full payload structure.

8. Pre-commit hook must only block on critical/high severity, not all issues

The original implementation blocked on ANY has_issues == true, regardless of severity. This caused low-severity findings (e.g., README wording, metadata-only changes) to block commits — clearly wrong for a development workflow. Both hooks (pre-commit AND task completion) must apply the same severity threshold: only block on critical or high. Medium and low findings should pass through.

Architecture

Trigger                    →  Core Script        →  Codex CLI                    →  Feedback Loop
───────────────────────────────────────────────────────────────────────────────────────────────────
PreToolUse (Bash matcher)  →  codex-review.py    →  codex exec review           →  hookSpecificOutput deny → Claude fixes → retry
                              (grep for git commit)  --uncommitted --json          (JSON on stdout, exit 0)
                                                     --ephemeral

TaskCompleted              →  codex-review.py    →  codex exec review           →  exit 2 + stderr → Claude sees feedback
                              (check git changes)    --uncommitted --json
                                                     --ephemeral

/codex-review skill        →  MCP codex_review   →  codex exec review           →  result in context → Claude fixes
                              (codex-mcp-server)     --uncommitted OR --base
                                                     --json --ephemeral

Components & Files

1. Core Review Script

File: ~/.claude/hooks/scripts/codex-review.py

Shebang: #!/usr/bin/env -S uv run --quiet
PEP 723 inline metadata: requires-python = ">=3.10", zero dependencies
Wraps codex exec review --json --ephemeral
Adds --uncommitted OR --base <branch> based on --scope flag
CRITICAL: When scope == "uncommitted", does NOT append any prompt arguments (mutually exclusive in Codex CLI)
When scope == "branch", may append focus/custom prompt as positional argument
Parses JSONL stdout: iterates events, extracts text from item.completed where item.type == "agent_message"
Severity classification order: (a) check for "no issues" patterns first, (b) look for Codex [P1]-[P4] markers, (c) fall back to keyword heuristic
Outputs structured JSON: {"has_issues": bool, "max_severity": str, "summary": str, "details": str, "error": str|null}
Accepts: --scope {uncommitted,branch}, --focus, --format {json,text}, --timeout, --blocking, --base, --files, --prompt, --min-severity
All error paths return valid JSON with has_issues: false — never crashes, never blocks on failure

2. Pre-Commit Hook (blocking)

File: ~/.claude/hooks/codex-pre-commit.sh

Event: PreToolUse with matcher: "Bash"
Reads full hook input from stdin (JSON with session_id, cwd, tool_name, tool_input, etc.)
Extracts tool_input.command via jq
Only triggers when command matches regex (^|\s|&&|\|)git\s+commit(\s|$)
Checks for changes: staged (git diff --cached), unstaged (git diff), AND untracked (git ls-files --others --exclude-standard)
Runs codex-review.py --scope uncommitted --format json --timeout 120
Only blocks on critical or high severity — medium/low pass through (same threshold as task completion hook)

If critical/high issues found: outputs the correct nested deny JSON on stdout and exits 0:

{
  "hookSpecificOutput": {
    "hookEventName": "PreToolUse",
    "permissionDecision": "deny",
    "permissionDecisionReason": "Codex review found issues..."
  }
}

If medium/low, clean, or Codex fails: exit 0 with no stdout (allow)

3. Task Completion Hook (blocking on critical/high)

File: ~/.claude/hooks/codex-task-review.sh

Event: TaskCompleted (no matcher support — always fires)
Reads stdin, extracts cwd
Checks: git repo? + has changes (staged/unstaged/untracked)?
Runs codex-review.py --scope uncommitted --format json --timeout 120
NOTE: --focus flag is NOT passed because it's ignored for uncommitted scope anyway
If critical/high severity: writes review to stderr, exit 2 — Claude receives feedback
If medium/low or clean: exit 0

4. MCP Server

File: ~/.claude/hooks/scripts/codex-mcp-server.py

Shebang: #!/usr/bin/env -S uv run --quiet
PEP 723: dependencies = ["mcp>=1.0"]
Uses FastMCP from mcp.server.fastmcp
Exposes codex_review tool with params: scope, focus, base_branch, files, custom_prompt, timeout
Delegates to codex-review.py subprocess
Returns JSON string result to Claude's context
Registered in: ~/.claude.json under mcpServers (NOT in settings.json)

5. `/codex-review` Skill

File: ~/.claude/commands/codex-review.md

Usage: /codex-review [scope] [--focus areas]
Instructs Claude to use MCP codex_review tool (or fallback to direct Bash)
Supports auto-fix loop: review → fix → re-review (max 3 iterations)

6. Settings & Configuration (TWO files)

~/.claude/settings.json — hooks only:

Add hooks.PreToolUse array with matcher "Bash" → command hook
Add hooks.TaskCompleted array → command hook
Each hook entry requires: type: "command", command: "/absolute/path/...", optionally timeout and statusMessage
Use absolute paths (not ~), because hooks may not expand tilde

~/.claude.json — MCP server only:

Add mcpServers.codex-review with type: "stdio", command: "/absolute/path/...", args: [], env: {}

Implementation Steps

Phase 1: Core Script

Create directory: mkdir -p ~/.claude/hooks/scripts/
Write codex-review.py with these critical behaviors:
- Parse JSONL: extract text from item.completed events where item.type == "agent_message"
- Skip prompt args when scope == "uncommitted" (Codex CLI constraint)
- Severity: check "no issues" patterns first, then [P1]-[P4] markers, then keyword heuristic
chmod +x ~/.claude/hooks/scripts/codex-review.py

Test in a git repo with a known-bad file:

cd /tmp && mkdir test-repo && cd test-repo && git init
echo "buggy code" > bug.py && git add . && git commit -m "init"
echo "more bugs" >> bug.py
~/.claude/hooks/scripts/codex-review.py --scope uncommitted --format json

Expected: JSON with has_issues: true, extracted review text in details

Phase 2: Hook Scripts

Write codex-pre-commit.sh with correct deny JSON format (nested hookSpecificOutput)
Write codex-task-review.sh — exit 2 with stderr on critical/high, NO --focus flag
chmod +x both scripts

Test pre-commit hook (must use git repo with actual changes):

cd /tmp/test-repo
echo '{"session_id":"test","cwd":"/tmp/test-repo","hook_event_name":"PreToolUse","tool_name":"Bash","tool_input":{"command":"git commit -m test"}}' \
  | /path/to/codex-pre-commit.sh

Expected: JSON with hookSpecificOutput.permissionDecision: "deny" on stdout

Test non-commit commands pass through:

echo '{"tool_name":"Bash","tool_input":{"command":"ls -la"},"cwd":"/tmp"}' | /path/to/codex-pre-commit.sh

Expected: no stdout, exit 0

Test task completion hook:

bash -c 'echo "{\"cwd\":\"/tmp/test-repo\"}" | /path/to/codex-task-review.sh >/dev/null 2>&1; echo "EXIT=$?"'

Expected: EXIT=2 if issues found, EXIT=0 if clean

Phase 3: Configuration

Edit ~/.claude/settings.json — add hooks key with PreToolUse and TaskCompleted arrays
- Use absolute paths in command field, NOT tilde-expanded paths
- Set timeout: 180 (Codex inference can take 30-90s)
Edit ~/.claude.json — add mcpServers.codex-review entry
- Script: use python3 -c "import json; ..." to safely merge into existing JSON without destroying other fields

Phase 4: MCP Server

Write codex-mcp-server.py
chmod +x
Test MCP server starts without import errors:
```
timeout 10 /path/to/codex-mcp-server.py < /dev/null; echo "EXIT=$?"
```
Expected: EXIT=0 (clean exit on EOF)

Phase 5: Skill

Write ~/.claude/commands/codex-review.md

Phase 6: Integration Testing

Create a fresh test repo, introduce a deliberate bug (invalid Python syntax)
Test core script produces correct JSON with has_issues: true
Test pre-commit hook outputs correct deny JSON format
Test task completion hook returns exit code 2 on high severity
Test with clean code — verify both hooks allow (exit 0, no deny output)
Restart Claude Code session to pick up new hooks and MCP server
In live session: make a buggy change, attempt commit → should be blocked
In live session: run /codex-review → verify review output appears in context

Codex CLI Reference (v0.99.0)

codex exec review [OPTIONS] [PROMPT]

  --uncommitted     Review staged, unstaged, and untracked changes
                    MUTUALLY EXCLUSIVE with [PROMPT] — cannot pass both
  --base <BRANCH>   Review changes against base branch (can combine with [PROMPT])
  --commit <SHA>    Review changes from a specific commit
  --json            Output JSONL events to stdout
  --ephemeral       Don't persist session files
  --full-auto       Run with automatic sandboxed execution
  -m, --model       Override model
  --title <TITLE>   Commit title for review summary

JSONL Event Types (--json output)

thread.started      → {"type":"thread.started","thread_id":"..."}
turn.started        → {"type":"turn.started"}
item.completed      → {"type":"item.completed","item":{"type":"reasoning|command_execution|agent_message",...}}
turn.completed      → {"type":"turn.completed","usage":{...}}

Review text is in events where item.type == "agent_message" → item.text Codex priority markers: [P1] (critical), [P2] (high), [P3] (medium), [P4] (low/style)

Claude Code Hooks Reference

PreToolUse Hook

stdin: {"session_id","transcript_path","cwd","permission_mode","hook_event_name":"PreToolUse","tool_name","tool_input"} stdout to deny:

{
  "hookSpecificOutput": {
    "hookEventName": "PreToolUse",
    "permissionDecision": "deny",
    "permissionDecisionReason": "reason string"
  }
}

Exit code: always 0 (decision is in JSON, not exit code)

TaskCompleted Hook

stdin: {"session_id","transcript_path","cwd","permission_mode","hook_event_name":"TaskCompleted","task_id","task_subject","task_description"} Block: exit 2 + stderr message (fed back to Claude) Allow: exit 0

Settings Schema (hooks portion of `~/.claude/settings.json`)

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {"type": "command", "command": "/absolute/path/to/script.sh", "timeout": 180, "statusMessage": "..."}
        ]
      }
    ],
    "TaskCompleted": [
      {
        "hooks": [
          {"type": "command", "command": "/absolute/path/to/script.sh", "timeout": 180, "statusMessage": "..."}
        ]
      }
    ]
  }
}

Key Design Decisions

Decision	Choice	Rationale
Core script deps	Zero (stdlib only)	Fast hook startup, no `uv` resolution delay
Pre-commit blocking	Nested JSON `hookSpecificOutput.permissionDecision: deny`, only critical/high	Don't block commits on minor style/metadata issues
Task completion blocking	Only critical/high severity via exit 2	Same threshold as pre-commit — consistency
Severity threshold	Both hooks: critical + high only	Medium/low findings pass through; blocking on low breaks workflow for benign changes like README edits
Graceful degradation	Always `exit 0` on errors (hooks), always return valid JSON (script)	Never break the development workflow
Codex flags	`--json --ephemeral` always; `--uncommitted` for working tree	Structured output, no session clutter
No prompt with --uncommitted	Skip `--focus`/`--prompt` args for uncommitted scope	Codex CLI rejects `--uncommitted` + `[PROMPT]`
Config split	Hooks in `settings.json`, MCP in `~/.claude.json`	Schema validation requires this separation
Severity classification	Codex `[P1]`-`[P4]` markers > "no issues" patterns > keyword heuristic	Codex's own priority markers are more reliable than keyword matching
Timeout	180s for hooks, 120s for script	Codex inference takes 30-90s; hook timeout must exceed script timeout
Absolute paths in hooks	Use `/Users/<user>/...` not `~/...`	Hook commands may not expand tilde

File Inventory

File	Type	Config Location
`~/.claude/hooks/scripts/codex-review.py`	Core script	Referenced by hooks and MCP server
`~/.claude/hooks/codex-pre-commit.sh`	Shell hook	`~/.claude/settings.json` → `hooks.PreToolUse`
`~/.claude/hooks/codex-task-review.sh`	Shell hook	`~/.claude/settings.json` → `hooks.TaskCompleted`
`~/.claude/hooks/scripts/codex-mcp-server.py`	MCP server	`~/.claude.json` → `mcpServers.codex-review`
`~/.claude/commands/codex-review.md`	Skill	Auto-discovered by Claude Code
`~/.claude/settings.json`	Config	Hooks registration
`~/.claude.json`	Config	MCP server registration

vshuraeff/codex-auto-review-for-claude-code-integration-plan.md

Select an option

No results found

Select an option

No results found

Plan: Codex CLI as Global Review Agent for Claude Code

Context

Lessons Learned (implementation errata)

1. PreToolUse deny JSON format is nested, not flat

2. MCP server config lives in `~/.claude.json`, not `~/.claude/settings.json`

3. `codex exec review --uncommitted` is mutually exclusive with `[PROMPT]`

4. Codex JSONL event structure differs from assumed format

5. Pre-commit hook must check untracked files, not just diffs

6. Severity heuristic produces false positives

7. Hook stdin JSON has more fields than assumed

8. Pre-commit hook must only block on critical/high severity, not all issues

Architecture

Components & Files

1. Core Review Script

2. Pre-Commit Hook (blocking)

3. Task Completion Hook (blocking on critical/high)

4. MCP Server

5. `/codex-review` Skill

6. Settings & Configuration (TWO files)

Implementation Steps

Phase 1: Core Script

Phase 2: Hook Scripts

Phase 3: Configuration

Phase 4: MCP Server

Phase 5: Skill

Phase 6: Integration Testing

Codex CLI Reference (v0.99.0)

JSONL Event Types (--json output)

Claude Code Hooks Reference

PreToolUse Hook

TaskCompleted Hook

Settings Schema (hooks portion of `~/.claude/settings.json`)

Key Design Decisions

File Inventory

vshuraeff/codex-auto-review-for-claude-code-integration-plan.md

Plan: Codex CLI as Global Review Agent for Claude Code

Context

Lessons Learned (implementation errata)

1. PreToolUse deny JSON format is nested, not flat

2. MCP server config lives in ~/.claude.json, not ~/.claude/settings.json

3. codex exec review --uncommitted is mutually exclusive with [PROMPT]

4. Codex JSONL event structure differs from assumed format

5. Pre-commit hook must check untracked files, not just diffs

6. Severity heuristic produces false positives

7. Hook stdin JSON has more fields than assumed

8. Pre-commit hook must only block on critical/high severity, not all issues

Architecture

Components & Files

1. Core Review Script

2. Pre-Commit Hook (blocking)

3. Task Completion Hook (blocking on critical/high)

4. MCP Server

5. /codex-review Skill

6. Settings & Configuration (TWO files)

Implementation Steps

Phase 1: Core Script

Phase 2: Hook Scripts

Phase 3: Configuration

Phase 4: MCP Server

Phase 5: Skill

Phase 6: Integration Testing

Codex CLI Reference (v0.99.0)

JSONL Event Types (--json output)

Claude Code Hooks Reference

PreToolUse Hook

TaskCompleted Hook

Settings Schema (hooks portion of ~/.claude/settings.json)

Key Design Decisions

File Inventory

2. MCP server config lives in `~/.claude.json`, not `~/.claude/settings.json`

3. `codex exec review --uncommitted` is mutually exclusive with `[PROMPT]`

5. `/codex-review` Skill

Settings Schema (hooks portion of `~/.claude/settings.json`)