Skip to content

Instantly share code, notes, and snippets.

@geoffreywoo
Created February 10, 2026 02:48
Show Gist options
  • Select an option

  • Save geoffreywoo/401695f3abcc9e9e7f432642c66b8ef6 to your computer and use it in GitHub Desktop.

Select an option

Save geoffreywoo/401695f3abcc9e9e7f432642c66b8ef6 to your computer and use it in GitHub Desktop.
Anti Hunter Ops Kit (OpenClaw)

Anti Hunter Ops Kit (OpenClaw)

A small set of battle-tested reliability guardrails for running an agent that uses cron + browser automation without bricking itself or spamming the timeline.

This kit is opinionated:

  • One browser driver at a time (global mutex)
  • One tab at a time (single-tab policy)
  • Fail closed (if you can’t verify a post, it didn’t happen)
  • Smoke test before you touch X (snapshots can lie; act can wedge)

Brand note: This is the ops layer behind Anti Hunter (@AntiHunter59823). It’s intentionally generic enough to reuse, but it’s written in the style of “ship receipts, don’t hand-wave.”


What’s inside

1) Browser mutex (global lock)

Prevents cron jobs and ad-hoc manual runs from colliding.

  • scripts/browser_mutex.py
    • acquire with backoff
    • stale lock takeover
    • release

2) Single-tab policy

Reduces wrong-tab actions, stale refs, and “tab not found.”

3) Deterministic browser.act smoke test

Detects when the browser control channel is unhealthy before you open X.

4) Stall runbook

Clear error classification + recovery ladder so you don’t thrash.


Install / paths

This repo assumes a working directory called $WORKSPACE.

Set these paths for your environment:

  • Mutex lock file: $WORKSPACE/memory/cron_browser_mutex.json
  • Mutex helper: $WORKSPACE/scripts/browser_mutex.py

Protocol: mutex + single tab + smoke test

A) Acquire lock (manual)

python3 $WORKSPACE/scripts/browser_mutex.py acquire \
  --label manual_<task_name> \
  --stale-ms 300000

B) Acquire lock (cron)

python3 $WORKSPACE/scripts/browser_mutex.py acquire \
  --label <cron_job_name> \
  --backoff 60,120,240 \
  --stale-ms 300000

If still locked after backoff: SKIP. Don’t touch the browser.

C) Single-tab policy

Immediately after acquiring the mutex:

  1. list tabs
  2. close all extra page tabs
  3. run the entire workflow in exactly one page tab

D) Mandatory smoke test (before touching X)

Use this exact sequence:

  1. Navigate to:

data:text/html,<html><body><textarea aria-label='t'></textarea></body></html>

  1. Snapshot (refs=aria)
  2. Click the textarea
  3. Type: ok
  4. Evaluate: textarea.value === "ok"

If the smoke test fails with a true timeout (error contains "timed out after"):

  • do your recovery budget (gateway restart max 1, browser stop/start max 1)
  • re-run smoke test
  • if still failing: STOP (don’t touch X)

X posting invariants (fail-closed)

Replies must be replies

Prefer intent composer:

  • https://x.com/intent/tweet?in_reply_to=<tweetId>

Preflight gate (required):

  • fresh snapshot
  • visible “Replying to …” context
  • textbox contains intended text
  • Reply button enabled

Postflight verification (required):

  • capture toast “View” URL if present
  • open the posted tweet
  • DOM verify it links to the parent /status/<parentTweetId>

If you can’t verify: do not update state.

URL capture is mandatory

Do not claim “posted” unless you have:

  • a captured tweet URL, and
  • (for replies) threading verified.

Minimal cron payload template (copy/paste)

Drop this at the top of any cron that touches the browser:

MUTEX
- acquire: python3 $WORKSPACE/scripts/browser_mutex.py acquire --label <job> --backoff 60,120,240 --stale-ms 300000
- if locked after backoff: SKIP
- release always: python3 $WORKSPACE/scripts/browser_mutex.py release

SINGLE TAB
- after lock: list tabs; close extra page tabs

SMOKE TEST
- data: textarea click+type+evaluate == "ok"

HARDENING
- browser.act timeoutMs=60000
- true stall signature: "timed out after"
- recovery budget: gateway restart max 1; browser stop/start max 1
- STOP after budget exhausted

FAIL CLOSED
- require toast View URL capture or reliable fallback
- verify threading for replies
- only then update state

Files (reference)

  • Mutex helper: scripts/browser_mutex.py
  • Single-tab + X rules: playbooks/browser_mutex_single_tab.md
  • Stall recovery: playbooks/stall_runbook.md

License

MIT (recommended). Replace this section if you want a different license.

#!/usr/bin/env python3
"""OpenClaw browser mutex helper.
Purpose
- Enforce *single* mutex for ALL browser automation (cron + ad-hoc).
- Avoid cron/manual collisions.
Lock file (canonical):
$WORKSPACE/memory/cron_browser_mutex.json
Usage
Acquire (fail if held and not stale):
python3 scripts/browser_mutex.py acquire --label manual_reply --stale-ms 300000
Acquire with backoff (cron-style):
python3 scripts/browser_mutex.py acquire --label x_mentions --backoff 60,120,240 --stale-ms 300000
Release:
python3 scripts/browser_mutex.py release
"""
from __future__ import annotations
import argparse
import json
import os
import time
from typing import Optional
LOCK = "$WORKSPACE/memory/cron_browser_mutex.json"
def _now_ms() -> int:
return int(time.time() * 1000)
def _read_lock() -> Optional[dict]:
try:
with open(LOCK, "r") as f:
return json.load(f)
except FileNotFoundError:
return None
except Exception:
# Corrupt lock file -> treat as stale.
return {"label": "<corrupt>", "startedAtMs": 0}
def _write_lock(label: str) -> None:
os.makedirs(os.path.dirname(LOCK), exist_ok=True)
with open(LOCK, "w") as f:
json.dump({"label": label, "startedAtMs": _now_ms()}, f)
def acquire(label: str, stale_ms: int, backoff: Optional[list[int]]) -> None:
waits = backoff or []
attempts = max(1, len(waits) + 1)
for i in range(attempts):
st = _read_lock()
if not st:
_write_lock(label)
print("LOCK_OK")
return
age = _now_ms() - int(st.get("startedAtMs", 0) or 0)
if age >= stale_ms:
# Take over stale lock.
_write_lock(label)
print("LOCK_OK_STALE_TAKEOVER", json.dumps(st), "age_ms", age)
return
if i < len(waits):
wait_s = waits[i]
print("LOCKED_BY", json.dumps(st), "age_ms", age, "waiting_s", wait_s)
time.sleep(wait_s)
continue
print("LOCKED_BY", json.dumps(st), "age_ms", age)
raise SystemExit(3)
def release() -> None:
try:
os.remove(LOCK)
print("LOCK_RELEASED")
except FileNotFoundError:
print("LOCK_NOT_HELD")
def main() -> None:
ap = argparse.ArgumentParser()
sub = ap.add_subparsers(dest="cmd", required=True)
ap_a = sub.add_parser("acquire")
ap_a.add_argument("--label", required=True)
ap_a.add_argument("--stale-ms", type=int, default=300000)
ap_a.add_argument(
"--backoff",
help="Comma-separated seconds list, e.g. 60,120,240 (cron-style).",
default=None,
)
sub.add_parser("release")
args = ap.parse_args()
if args.cmd == "acquire":
backoff = None
if args.backoff:
backoff = [int(x.strip()) for x in args.backoff.split(",") if x.strip()]
acquire(args.label, args.stale_ms, backoff)
elif args.cmd == "release":
release()
if __name__ == "__main__":
main()

Browser Mutex + Single-Tab Playbook (X automation)

Goal

Prevent collisions between cron jobs and ad-hoc manual runs that use the OpenClaw managed browser, and reduce wrong-tab / stale-ref failures.

Canonical Lock

  • Lock file: $WORKSPACE/memory/cron_browser_mutex.json
  • One writer at a time across all X/browser flows.

Acquire (manual/ad-hoc)

Use the shared helper (handles stale takeover + optional backoff):

python3 $WORKSPACE/scripts/browser_mutex.py acquire \
  --label manual_<task_name> \
  --stale-ms 300000

Acquire (cron)

Prefer using the helper with built-in backoff (do not spawn background sleep && acquire processes):

python3 $WORKSPACE/scripts/browser_mutex.py acquire \
  --label <cron_job_name> \
  --backoff 60,120,240 \
  --stale-ms 300000

If still locked after backoff: SKIP run (do not touch the browser).

If lock is stale (>= 5 minutes old): the helper will take over and print LOCK_OK_STALE_TAKEOVER ….

Release (always)

python3 $WORKSPACE/scripts/browser_mutex.py release

Single-Tab Policy

Immediately after acquiring the mutex:

  1. List tabs.
  2. Close all extra page tabs.
  3. Run the whole workflow in one page tab:
    • compose
    • click Reply/Post
    • follow toast View link
    • DOM verify
    • navigate back

Do not leave extra View/timeline/search tabs open.

Mandatory “act” smoke test (before touching X)

Snapshots working isn’t enough; browser.act can wedge.

After lock acquisition + single-tab cleanup, run a deterministic smoke test:

  1. browser.navigate to a tiny data: page with a <textarea aria-label="t">.
  2. browser.snapshot and capture the textarea ref.
  3. browser.act(click) textarea.
  4. browser.act(type) text ok.
  5. browser.act(evaluate) to verify textarea.value === "ok".

If the smoke test fails with a true timeout ("timed out after"), release the mutex and SKIP. Don’t touch X.

Thread-Safety Invariants (Replies)

  • Open composer via: https://x.com/intent/tweet?in_reply_to=<tweetId>
  • Must see "Replying to …" context.
  • After clicking Reply/Post:
    • require toast View link
    • open View page
    • DOM-verify parent exists: any a[href*="/status/<tweetId>"]
  • If cannot verify threading: do not log state; retry once after browser restart; else stop.

Failure Handling

  • Stale ref / unknown ref: take a fresh snapshot.
  • If actions apply to wrong page or the tab disappears: stop and restart browser once.
  • 3 consecutive failures on the same goal => stop.
MIT License
Copyright (c) 2026 Geoffrey Woo
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Stall Runbook (OpenClaw) — browser control + cron execution

Purpose

Turn “agent stuck” incidents into deterministic recovery (and stop thrash).

Definitions

  • TRUE control-service stall: error contains "timed out after" (e.g. 20000ms) for browser.act.
  • Non-stall errors (recoverable): page.evaluate, Unknown ref, Element not found, null deref, invalid selector.

Global constraints

  • Use the global browser mutex: memory/cron_browser_mutex.json via scripts/browser_mutex.py.
  • Single-tab policy after acquiring mutex.
  • Mandatory deterministic browser.act smoke test before touching X.

Deterministic smoke test

Use this exact sequence:

  1. Navigate to: data:text/html,<html><body><textarea aria-label='t'></textarea></body></html>
  2. Snapshot (refs=aria)
  3. act(click) textarea
  4. act(type) ok
  5. act(evaluate) verify textarea.value === "ok"

If smoke test fails with TRUE timeout → treat as control-service stall.

Recovery ladder (per run budgets)

Tier 0 — don’t misdiagnose

If it’s a non-stall error:

  • Take a fresh snapshot and re-resolve refs.
  • Rewrite evaluate null-safe.
  • Retry once.

Tier 1 — light recovery

If TRUE stall:

  • Restart gateway (max 1 per run).
  • Restart managed browser stop/start (budget depends on job; default 1, search job can do 2).
  • Re-run smoke test after each recovery action.

Tier 2 — hard recovery (manual/ops)

If TRUE stall persists after budgets:

  • STOP the run.
  • Record an incident entry (see below).
  • Backoff (don’t hammer).

Incident logging (mandatory)

When a run STOPs due to stall: Create an incident entry in memory/incidents.jsonl with:

  • ts (ISO)
  • job name/id
  • symptom signature (exact error string)
  • last successful smoke test ts
  • what recovery was attempted
  • outcome

Then: create a small TODO in the approval queue if it needs human input.

Post-incident shipping loop

For repeat incidents (>=2/day same signature):

  • Promote to a task: write a minimal repro + proposed fix.
  • Implement fix + commit.
  • Add regression check (e.g., enforce smoke test first, or add better selector guards).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment