Bot Alert Spam Analysis

Two recurring bot alerts from "The Secretary" in the backend Telegram group are spamming notifications. Both share a common pattern: transient failures triggering unbounded alert loops.

1. Backup Health Alert: Integrity check failed

Source: ~/code/dotfiles/bin/.local/bin/backup-monitor (launchd service local.backup-monitor, runs every 30 min)

What's happening

The restic repository has stale locks from old backup processes (PIDs 759, 1940, 2845) that crashed without cleanup, dating back to Feb 2-3. Every time restic check runs, it fails because the repo is locked.

Why it spams

Bug in the monitor script. When the integrity check passes, LAST_INTEGRITY_CHECK is updated in the state file. When it fails, it's not:

if check_output=$(run_integrity_check); then
    LAST_INTEGRITY_CHECK=$current_epoch    # updated on success
else
    integrity_status="FAILED"
    failure_messages+=("Integrity check failed")
    # LAST_INTEGRITY_CHECK is NOT updated here
fi

The state file shows LAST_INTEGRITY_CHECK stuck at 1770038199 (Feb 2, 08:16). Every 30-minute run calculates hours_since_integrity >= 4, re-runs the check, it fails again, fires another alert. Infinite loop.

Fixes

Immediate: restic unlock to clear stale locks
Script: Update LAST_INTEGRITY_CHECK even on failure so it respects the 4-hour interval regardless of outcome

2. ClaudeCodeLog Scraper Error

Source: clis/internalctl/internalctl/run_scrape_claude_changelog_analysis.py (runs in ah-control:monitor-changelog tmux window)

What's happening

The scraper uses agent-browser to visit x.com/ClaudeCodeLog every 15-25 minutes checking for new Claude Code version announcements. The browser runs inside an Apple container with Wayland + Chromium. Various transient failures occur:

Error	Cause
`Browser not ready: timeout`	Container still starting up
`net::ERR_NETWORK_CHANGED`	Machine sleep/wake cycle
`Target page, context or browser has been closed`	Chromium crashed mid-navigation
`Failed to connect via CDP`	Race condition - browser reported ready but CDP unreachable

These are all expected transient issues. The scraper actually recovers on the next run - there was a successful run at 11:45 between two error clusters.

Why it spams

Every single exception sends a Telegram notification with zero deduplication:

except Exception as e:
    error_msg = str(e)
    send_error_notification(f"Exception: {error_msg}")

A machine going to sleep overnight generated 7 error notifications. A brief network blip generates 1-2. There's no distinction between "transient browser hiccup" and "something is actually broken."

Fixes

Error suppression: Track consecutive failure count in state. Only notify after N consecutive failures (e.g., 3). Send a single "recovered" message when it starts working again.
Retry logic: Retry start_browser() and navigation once before declaring failure. Most transient issues resolve on a second attempt.
Error classification: Transient infrastructure errors (ERR_NETWORK_CHANGED, timeout, browser closed) could be logged locally without notifying Telegram at all.

Common theme

Both issues boil down to: alerts that fire on every failure without any suppression, deduplication, or cooldown. The backup monitor needs its state update bug fixed. The scraper needs a consecutive-failure threshold before alerting.

possibilities/bot-alert-analysis.md

Select an option

No results found

Select an option

No results found

Bot Alert Spam Analysis

1. Backup Health Alert: Integrity check failed

What's happening

Why it spams

Fixes

2. ClaudeCodeLog Scraper Error

What's happening

Why it spams

Fixes

Common theme