This document outlines a three-tier observability system for PAI (Personal AI) agents, emphasising local-first event capture with optional cloud integration.
The system captures events to JSONL files locally, making them queryable immediately via Unix tools (tail, grep, jq) without requiring infrastructure. Optional components include a collector daemon and observability stack (VictoriaMetrics + Grafana) for historical analysis and alerting.
File-based capture: Events append to daily JSONL files rather than using HTTP, prioritising simplicity and offline resilience.
Distributed push model: The collector pushes events outbound to the observability stack; nothing reaches inward into the PAI environment, maintaining security.
Progressive enhancement: Users start with Stage 0 (CLI-only), advance to Stage 1 (local containerised stack via OrbStack), then to Stage 2+ (central or cloud backends) as needs grow.
Recommended approach: Use a PreToolUse hook on the Skill tool rather than pattern matching at UserPromptSubmit. This provides definitive skill invocation logging.
// hooks/SkillInvocation.hook.ts
// PreToolUse matcher: "Skill"
import { appendFileSync } from "fs";
const toolInput = JSON.parse(process.env.TOOL_INPUT || "{}");
const sessionId = process.env.SESSION_ID || "unknown";
const skillName = toolInput.skill;
if (!skillName) process.exit(0);
const event = {
event_type: "skill.invoked",
ts: Date.now(),
session_id: sessionId,
skill: skillName,
args: toolInput.args || null,
};
const today = new Date().toISOString().split("T")[0];
const eventsDir = `${process.env.PAI_DIR}/MEMORY/Events`;
appendFileSync(`${eventsDir}/${today}.jsonl`, JSON.stringify(event) + "\n");Why PreToolUse over UserPromptSubmit pattern matching:
| Aspect | UserPromptSubmit (Pattern) | PreToolUse (Skill) |
|---|---|---|
| Accuracy | Speculative (pattern matched) | Definitive (skill IS firing) |
| Skill name | Inferred from pattern | Exact from TOOL_INPUT |
| False positives | Yes (pattern matches, skill doesn't fire) | No |
| False negatives | Yes (synonym used, pattern misses) | No |
This approach logs what actually happens rather than what pattern matching predicts will happen.
OrbStack vs Docker Desktop: OrbStack consumes ~300MB RAM versus ~2GB for Docker Desktop, making it the recommended container runtime.
VictoriaMetrics stack: Selected over Grafana LGTM for superior compression (15x better than Loki) and lower resource footprint (~350-450MB versus ~800MB-1.2GB).
Collector approach: Begin with native launchd + curl (zero dependencies), upgrade to Vector only if visibility into the collector itself becomes necessary.
Phase 0 establishes local JSONL capture and CLI querying. Phase 1 adds background collection with self-monitoring via watchdog. Phase 2 deploys the containerised observability stack. Phase 3 implements Grafana alerting for autonomous agent scenarios.
Events are never transmitted inbound; credentials and sensitive fields undergo scrubbing before capture; all data remains local until explicitly forwarded outbound.
- pai-skill-enforcer#1 — Architectural discussion on deterministic matching for enforcement vs discovery
- Claude Code Skills Best Practices — Reference for skill development patterns