Problem: Claude Code edits code without independent validation. A second AI reviewer (OpenAI Codex) can catch bugs, security issues, and style problems before they're committed — acting as an automated "second pair of eyes."
Goal: Integrate Codex CLI (v0.99.0, already installed) into every Claude Code session as a global review agent with three trigger points: task completion, pre-commit blocking, and on-demand /codex-review command.
Outcome: Claude's changes get automatically reviewed by Codex, and Claude fixes issues before they reach git history.
These are the specific mistakes and inaccuracies discovered during the first implementation attempt. Every section below incorporates these fixes.
Wrong: {"decision":"deny","reason":"..."}
Correct: The output must be wrapped in hookSpecificOutput:
{
"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "deny",
"permissionDecisionReason": "your reason here"
}
}Exit code must be 0 when outputting this JSON — the decision is in the payload, not the exit code.
~/.claude/settings.json does NOT have an mcpServers field (schema validation rejects it).
MCP servers are registered in ~/.claude.json under the top-level mcpServers key:
{
"mcpServers": {
"codex-review": {
"type": "stdio",
"command": "/absolute/path/to/codex-mcp-server.py",
"args": [],
"env": {}
}
}
}Codex CLI rejects any positional prompt argument (including - for stdin) when --uncommitted is used:
error: the argument '--uncommitted' cannot be used with '[PROMPT]'
Consequence: --focus, --files, and --prompt wrapper flags have NO effect for uncommitted reviews. Custom prompts only work with --base <BRANCH> scope. The core script must silently skip prompt construction when scope == "uncommitted".
Wrong assumption: events are {"type":"message","role":"assistant","content":[{"type":"text","text":"..."}]}
Actual format: Codex outputs these event types:
{"type":"thread.started","thread_id":"..."}— session start{"type":"turn.started"}— turn boundary{"type":"item.completed","item":{"id":"...","type":"reasoning","text":"..."}}— reasoning steps{"type":"item.completed","item":{"id":"...","type":"command_execution","command":"...","aggregated_output":"...","exit_code":0,"status":"completed"}}— tool calls{"type":"item.completed","item":{"id":"...","type":"agent_message","text":"..."}}— the actual review text{"type":"turn.completed","usage":{...}}— turn end
The review text is in: event.item.text where event.type == "item.completed" and event.item.type == "agent_message".
git diff --cached --quiet && git diff --quiet only checks tracked files. Untracked files (the common case when Claude creates new files) are missed entirely. Must also check: git ls-files --others --exclude-standard.
Keywords like "error", "bug" appear in benign Codex output (e.g., "SyntaxError" in command output, or "no bugs found"). The heuristic classifier should:
- Run AFTER "no issues" pattern detection, not before
- Use Codex's own priority markers (
[P1],[P2],[P3],[P4]) when present, which are more reliable than keyword matching - Ignore keywords that appear inside quoted command output or code blocks
Assumed: {"tool_name":"Bash","tool_input":{"command":"..."},"cwd":"..."}
Actual PreToolUse input:
{
"session_id": "abc123",
"transcript_path": "/path/to/transcript.jsonl",
"cwd": "/current/working/dir",
"permission_mode": "default",
"hook_event_name": "PreToolUse",
"tool_name": "Bash",
"tool_input": {"command": "git commit -m \"test\""}
}Actual TaskCompleted input:
{
"session_id": "abc123",
"transcript_path": "/path/to/transcript.jsonl",
"cwd": "/current/working/dir",
"permission_mode": "default",
"hook_event_name": "TaskCompleted",
"task_id": "task-001",
"task_subject": "...",
"task_description": "..."
}The hook scripts work because they use jq with // empty fallback, but tests should use the full payload structure.
The original implementation blocked on ANY has_issues == true, regardless of severity. This caused low-severity findings (e.g., README wording, metadata-only changes) to block commits — clearly wrong for a development workflow. Both hooks (pre-commit AND task completion) must apply the same severity threshold: only block on critical or high. Medium and low findings should pass through.
Trigger → Core Script → Codex CLI → Feedback Loop
───────────────────────────────────────────────────────────────────────────────────────────────────
PreToolUse (Bash matcher) → codex-review.py → codex exec review → hookSpecificOutput deny → Claude fixes → retry
(grep for git commit) --uncommitted --json (JSON on stdout, exit 0)
--ephemeral
TaskCompleted → codex-review.py → codex exec review → exit 2 + stderr → Claude sees feedback
(check git changes) --uncommitted --json
--ephemeral
/codex-review skill → MCP codex_review → codex exec review → result in context → Claude fixes
(codex-mcp-server) --uncommitted OR --base
--json --ephemeral
File: ~/.claude/hooks/scripts/codex-review.py
- Shebang:
#!/usr/bin/env -S uv run --quiet - PEP 723 inline metadata:
requires-python = ">=3.10", zero dependencies - Wraps
codex exec review --json --ephemeral - Adds
--uncommittedOR--base <branch>based on--scopeflag - CRITICAL: When
scope == "uncommitted", does NOT append any prompt arguments (mutually exclusive in Codex CLI) - When
scope == "branch", may append focus/custom prompt as positional argument - Parses JSONL stdout: iterates events, extracts text from
item.completedwhereitem.type == "agent_message" - Severity classification order: (a) check for "no issues" patterns first, (b) look for Codex
[P1]-[P4]markers, (c) fall back to keyword heuristic - Outputs structured JSON:
{"has_issues": bool, "max_severity": str, "summary": str, "details": str, "error": str|null} - Accepts:
--scope {uncommitted,branch},--focus,--format {json,text},--timeout,--blocking,--base,--files,--prompt,--min-severity - All error paths return valid JSON with
has_issues: false— never crashes, never blocks on failure
File: ~/.claude/hooks/codex-pre-commit.sh
- Event:
PreToolUsewithmatcher: "Bash" - Reads full hook input from stdin (JSON with
session_id,cwd,tool_name,tool_input, etc.) - Extracts
tool_input.commandviajq - Only triggers when command matches regex
(^|\s|&&|\|)git\s+commit(\s|$) - Checks for changes: staged (
git diff --cached), unstaged (git diff), AND untracked (git ls-files --others --exclude-standard) - Runs
codex-review.py --scope uncommitted --format json --timeout 120 - Only blocks on critical or high severity — medium/low pass through (same threshold as task completion hook)
- If critical/high issues found: outputs the correct nested deny JSON on stdout and exits 0:
{ "hookSpecificOutput": { "hookEventName": "PreToolUse", "permissionDecision": "deny", "permissionDecisionReason": "Codex review found issues..." } } - If medium/low, clean, or Codex fails:
exit 0with no stdout (allow)
File: ~/.claude/hooks/codex-task-review.sh
- Event:
TaskCompleted(no matcher support — always fires) - Reads stdin, extracts
cwd - Checks: git repo? + has changes (staged/unstaged/untracked)?
- Runs
codex-review.py --scope uncommitted --format json --timeout 120 - NOTE:
--focusflag is NOT passed because it's ignored for uncommitted scope anyway - If critical/high severity: writes review to stderr,
exit 2— Claude receives feedback - If medium/low or clean:
exit 0
File: ~/.claude/hooks/scripts/codex-mcp-server.py
- Shebang:
#!/usr/bin/env -S uv run --quiet - PEP 723:
dependencies = ["mcp>=1.0"] - Uses
FastMCPfrommcp.server.fastmcp - Exposes
codex_reviewtool with params:scope,focus,base_branch,files,custom_prompt,timeout - Delegates to
codex-review.pysubprocess - Returns JSON string result to Claude's context
- Registered in:
~/.claude.jsonundermcpServers(NOT insettings.json)
File: ~/.claude/commands/codex-review.md
- Usage:
/codex-review [scope] [--focus areas] - Instructs Claude to use MCP
codex_reviewtool (or fallback to direct Bash) - Supports auto-fix loop: review → fix → re-review (max 3 iterations)
~/.claude/settings.json — hooks only:
- Add
hooks.PreToolUsearray with matcher"Bash"→ command hook - Add
hooks.TaskCompletedarray → command hook - Each hook entry requires:
type: "command",command: "/absolute/path/...", optionallytimeoutandstatusMessage - Use absolute paths (not
~), because hooks may not expand tilde
~/.claude.json — MCP server only:
- Add
mcpServers.codex-reviewwithtype: "stdio",command: "/absolute/path/...",args: [],env: {}
- Create directory:
mkdir -p ~/.claude/hooks/scripts/ - Write
codex-review.pywith these critical behaviors:- Parse JSONL: extract text from
item.completedevents whereitem.type == "agent_message" - Skip prompt args when
scope == "uncommitted"(Codex CLI constraint) - Severity: check "no issues" patterns first, then
[P1]-[P4]markers, then keyword heuristic
- Parse JSONL: extract text from
chmod +x ~/.claude/hooks/scripts/codex-review.py- Test in a git repo with a known-bad file:
Expected: JSON with
cd /tmp && mkdir test-repo && cd test-repo && git init echo "buggy code" > bug.py && git add . && git commit -m "init" echo "more bugs" >> bug.py ~/.claude/hooks/scripts/codex-review.py --scope uncommitted --format json
has_issues: true, extracted review text indetails
- Write
codex-pre-commit.shwith correct deny JSON format (nestedhookSpecificOutput) - Write
codex-task-review.sh— exit 2 with stderr on critical/high, NO--focusflag chmod +xboth scripts- Test pre-commit hook (must use git repo with actual changes):
Expected: JSON with
cd /tmp/test-repo echo '{"session_id":"test","cwd":"/tmp/test-repo","hook_event_name":"PreToolUse","tool_name":"Bash","tool_input":{"command":"git commit -m test"}}' \ | /path/to/codex-pre-commit.sh
hookSpecificOutput.permissionDecision: "deny"on stdout - Test non-commit commands pass through:
Expected: no stdout, exit 0
echo '{"tool_name":"Bash","tool_input":{"command":"ls -la"},"cwd":"/tmp"}' | /path/to/codex-pre-commit.sh
- Test task completion hook:
Expected:
bash -c 'echo "{\"cwd\":\"/tmp/test-repo\"}" | /path/to/codex-task-review.sh >/dev/null 2>&1; echo "EXIT=$?"'EXIT=2if issues found,EXIT=0if clean
- Edit
~/.claude/settings.json— addhookskey withPreToolUseandTaskCompletedarrays- Use absolute paths in
commandfield, NOT tilde-expanded paths - Set
timeout: 180(Codex inference can take 30-90s)
- Use absolute paths in
- Edit
~/.claude.json— addmcpServers.codex-reviewentry- Script: use
python3 -c "import json; ..."to safely merge into existing JSON without destroying other fields
- Script: use
- Write
codex-mcp-server.py chmod +x- Test MCP server starts without import errors:
Expected:
timeout 10 /path/to/codex-mcp-server.py < /dev/null; echo "EXIT=$?"
EXIT=0(clean exit on EOF)
- Write
~/.claude/commands/codex-review.md
- Create a fresh test repo, introduce a deliberate bug (invalid Python syntax)
- Test core script produces correct JSON with
has_issues: true - Test pre-commit hook outputs correct deny JSON format
- Test task completion hook returns exit code 2 on high severity
- Test with clean code — verify both hooks allow (exit 0, no deny output)
- Restart Claude Code session to pick up new hooks and MCP server
- In live session: make a buggy change, attempt commit → should be blocked
- In live session: run
/codex-review→ verify review output appears in context
codex exec review [OPTIONS] [PROMPT]
--uncommitted Review staged, unstaged, and untracked changes
MUTUALLY EXCLUSIVE with [PROMPT] — cannot pass both
--base <BRANCH> Review changes against base branch (can combine with [PROMPT])
--commit <SHA> Review changes from a specific commit
--json Output JSONL events to stdout
--ephemeral Don't persist session files
--full-auto Run with automatic sandboxed execution
-m, --model Override model
--title <TITLE> Commit title for review summary
thread.started → {"type":"thread.started","thread_id":"..."}
turn.started → {"type":"turn.started"}
item.completed → {"type":"item.completed","item":{"type":"reasoning|command_execution|agent_message",...}}
turn.completed → {"type":"turn.completed","usage":{...}}
Review text is in events where item.type == "agent_message" → item.text
Codex priority markers: [P1] (critical), [P2] (high), [P3] (medium), [P4] (low/style)
stdin: {"session_id","transcript_path","cwd","permission_mode","hook_event_name":"PreToolUse","tool_name","tool_input"}
stdout to deny:
{
"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "deny",
"permissionDecisionReason": "reason string"
}
}Exit code: always 0 (decision is in JSON, not exit code)
stdin: {"session_id","transcript_path","cwd","permission_mode","hook_event_name":"TaskCompleted","task_id","task_subject","task_description"}
Block: exit 2 + stderr message (fed back to Claude)
Allow: exit 0
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{"type": "command", "command": "/absolute/path/to/script.sh", "timeout": 180, "statusMessage": "..."}
]
}
],
"TaskCompleted": [
{
"hooks": [
{"type": "command", "command": "/absolute/path/to/script.sh", "timeout": 180, "statusMessage": "..."}
]
}
]
}
}| Decision | Choice | Rationale |
|---|---|---|
| Core script deps | Zero (stdlib only) | Fast hook startup, no uv resolution delay |
| Pre-commit blocking | Nested JSON hookSpecificOutput.permissionDecision: deny, only critical/high |
Don't block commits on minor style/metadata issues |
| Task completion blocking | Only critical/high severity via exit 2 | Same threshold as pre-commit — consistency |
| Severity threshold | Both hooks: critical + high only | Medium/low findings pass through; blocking on low breaks workflow for benign changes like README edits |
| Graceful degradation | Always exit 0 on errors (hooks), always return valid JSON (script) |
Never break the development workflow |
| Codex flags | --json --ephemeral always; --uncommitted for working tree |
Structured output, no session clutter |
| No prompt with --uncommitted | Skip --focus/--prompt args for uncommitted scope |
Codex CLI rejects --uncommitted + [PROMPT] |
| Config split | Hooks in settings.json, MCP in ~/.claude.json |
Schema validation requires this separation |
| Severity classification | Codex [P1]-[P4] markers > "no issues" patterns > keyword heuristic |
Codex's own priority markers are more reliable than keyword matching |
| Timeout | 180s for hooks, 120s for script | Codex inference takes 30-90s; hook timeout must exceed script timeout |
| Absolute paths in hooks | Use /Users/<user>/... not ~/... |
Hook commands may not expand tilde |
| File | Type | Config Location |
|---|---|---|
~/.claude/hooks/scripts/codex-review.py |
Core script | Referenced by hooks and MCP server |
~/.claude/hooks/codex-pre-commit.sh |
Shell hook | ~/.claude/settings.json → hooks.PreToolUse |
~/.claude/hooks/codex-task-review.sh |
Shell hook | ~/.claude/settings.json → hooks.TaskCompleted |
~/.claude/hooks/scripts/codex-mcp-server.py |
MCP server | ~/.claude.json → mcpServers.codex-review |
~/.claude/commands/codex-review.md |
Skill | Auto-discovered by Claude Code |
~/.claude/settings.json |
Config | Hooks registration |
~/.claude.json |
Config | MCP server registration |