AGI SDK Binary Architecture - Claude Code Pattern
This document describes the AGI driver binary and SDK architecture. The driver is a self-contained agent that captures screenshots, reasons with Claude, and executes actions autonomously. SDKs are thin event wrappers.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β agi-api-driver β
β (locally: ~/Code/agi-api-driver) β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β SELF-CONTAINED AGENT DRIVER BINARY β β
β β β β
β β βββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββ β β
β β β Executor β β Agent/LLM β β Environment β β β
β β β β’ State β β β’ Claude β β β’ Screenshot capture β β β
β β β machine β β API β β β’ Action execution β β β
β β β β’ Event β β β’ Tools β β β’ Screen size detect β β β
β β β emission β β β’ Prompts β β β’ DPI/scale factor β β β
β β βββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β CI compiles & publishes β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β GitHub Releases: agi-driver-v1.0.0-{platform} β β
β β darwin-arm64 | darwin-x64 | linux-x64 | windows-x64 β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββΌββββββββββββββββββββββββ
β β β
ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
β agi-python β β agi-node β β agi-csharp β
β β β β β β
β THIN WRAPPER β β THIN WRAPPER β β THIN WRAPPER β
β β’ Spawn binary β β β’ Spawn binary β β β’ Spawn binary β
β β’ Event hooks β β β’ Event hooks β β β’ Event hooks β
β β’ Send commands β β β’ Send commands β β β’ Send commands β
β (no platform β β (no platform β β (no platform β
β code needed) β β code needed) β β code needed) β
ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
The driver runs autonomously on the local machine:
SDK sends: {"command":"start","goal":"Open calculator","mode":"local"}
Driver: 1. Captures screenshot via Pillow/scrot
2. Calls Claude API with screenshot + goal
3. Emits thinking/action events (informational)
4. Executes actions locally (JXA/PowerShell/xdotool)
5. Waits 0.5s for screen to settle
6. Captures next screenshot
7. Repeat until finished/error
SDK: Just listens to events. No platform code needed.
The SDK manages screenshots and action execution (existing behavior):
SDK sends: {"command":"start","goal":"...","screenshot":"base64...","screen_width":1920,"screen_height":1080}
Driver: 1. Receives screenshot from SDK
2. Calls Claude API
3. Emits action events
SDK: 4. Executes actions
5. Captures screenshot
6. Sends screenshot command
7. Repeat
| Aspect | Before | After |
|---|---|---|
| Screenshot capture | SDK responsibility | Driver captures locally |
| Action execution | SDK responsibility | Driver executes locally |
| Screen detection | SDK responsibility | Driver detects automatically |
| SDK executor code | Required (600+ lines per SDK) | Deprecated (kept for backward compat) |
| SDK role | I/O adapter + executor | Pure event wrapper |
| Platform code | Duplicated in 3 SDKs | Single implementation in driver |
The driver includes a platform-aware environment module:
agi_driver/environment/
βββ __init__.py # Factory: create_environment("local")
βββ base.py # Abstract: BaseEnvironment
βββ local.py # LocalEnvironment - controls local machine
BaseEnvironment interface:
class BaseEnvironment(ABC):
async def initialize(self) -> None
async def capture_screenshot(self) -> tuple[str, int, int] # (base64, width, height)
async def execute_action(self, action: dict) -> bool
async def get_screen_size(self) -> tuple[int, int]
async def cleanup(self) -> NoneLocalEnvironment handles:
- Screenshot:
PIL.ImageGrab.grab()(macOS/Windows),scrot(Linux) - Clicks: JXA/CGEvent (macOS), PowerShell/user32.dll (Windows), xdotool (Linux)
- Typing: JXA with JSON escaping (macOS), Base64+SendKeys (Windows), xdotool (Linux)
- Keys: AppleScript key codes (macOS), SendKeys format (Windows), xdotool (Linux)
- Scroll/Drag: Platform-specific implementations
- DPI/Scale: NSScreen (macOS), Registry (Windows), GDK_SCALE (Linux)
{"event":"ready","version":"0.1.0","protocol":"jsonl","step":0}
{"event":"state_change","state":"running","step":0}
{"event":"screenshot_captured","width":3024,"height":1964,"step":0}
{"event":"thinking","text":"I see the desktop with a dock at the bottom...","step":1}
{"event":"action","action":{"type":"click","x":150,"y":200},"step":1}
{"event":"screenshot_captured","width":3024,"height":1964,"step":1}
{"event":"confirm","action":{},"reason":"Delete this file?","step":2}
{"event":"ask_question","question":"What email should I use?","question_id":"q1","step":3}
{"event":"finished","reason":"completed","summary":"Opened calculator and computed 2+2=4","success":true,"step":10}
{"event":"error","message":"Model inference failed","code":"step_error","recoverable":true,"step":5}New event: screenshot_captured - Emitted in local mode when the driver captures a screenshot. Lightweight notification (no image data) so SDKs know a step boundary occurred.
{"command":"start","session_id":"sess_abc","goal":"Open calculator","mode":"local"}
{"command":"start","session_id":"sess_def","goal":"Click login","screenshot":"base64...","screen_width":1920,"screen_height":1080}
{"command":"screenshot","data":"base64...","screen_width":1920,"screen_height":1080}
{"command":"pause"}
{"command":"resume"}
{"command":"stop","reason":"User cancelled"}
{"command":"confirm","approved":true,"message":""}
{"command":"answer","text":"user@example.com","question_id":"q1"}StartCommand changes:
modefield added:"local"for autonomous,""for legacyscreenshot,screen_width,screen_heightare ignored in local mode
βββββββββββββββββββββββ
β β
start βΌ β
βββββββββ βββββββββββββ> βββββββββββββββ β
β IDLE β β RUNNING β<βββββββββββββ€
βββββββββ βββββββββββββββ β
β β
βββββββββββββββββββββββββΌββββββββββββββββββββ β
β β β β
βΌ βΌ βΌ β
ββββββββββββββββ βββββββββββββββββββββ ββββββββββββββββββ
β PAUSED β β WAITING_CONFIRM β β WAITING_ANSWER β
β β β β β β
β resume() β β confirm(bool) β β answer(str) β
ββββββββ¬ββββββββ βββββββββββ¬ββββββββββ βββββββββ¬βββββββββ
β β β
ββββββββββββββββββββββββ΄ββββββββββββββββββββββ
ANY STATE ββββ stop() ββββ> STOPPED
ANY STATE ββββ error βββββ> ERROR
RUNNING ββββββ finish βββ> FINISHED
agi-api-driver/
βββ src/agi_driver/
β βββ __init__.py # Package exports, version
β βββ __main__.py # CLI entry point
β βββ executor.py # Main execution loop (local + legacy modes)
β βββ state_machine.py # State enum and transitions
β βββ agent/
β β βββ base.py # BaseDriverAgent, AgentAction, StepResult
β β βββ desktop_agent.py # DesktopAgent (Claude API integration)
β β βββ prompt.py # System prompts
β β βββ tools.py # Desktop automation tool definitions
β βββ environment/ # NEW: Self-contained environment
β β βββ __init__.py # Factory: create_environment()
β β βββ base.py # Abstract BaseEnvironment
β β βββ local.py # LocalEnvironment (screenshot + actions)
β βββ llm/
β β βββ anthropic.py # Claude API client with retry
β βββ protocol/
β βββ commands.py # 7 command types (start now has mode)
β βββ events.py # 9 event types (+ screenshot_captured)
β βββ jsonl.py # JSON Lines I/O
βββ .github/workflows/
β βββ build-agi-driver.yml # Cross-platform Nuitka build
βββ pyproject.toml
from agi import AgentDriver, DriverOptions
driver = AgentDriver(DriverOptions(mode="local"))
driver.on_thinking(lambda t: print(f"Thinking: {t}"))
driver.on_action(lambda a: print(f"Action: {a.type}")) # Informational only
result = await driver.start(goal="Open calculator and compute 2+2")
print(f"Done: {result.summary}")import { AgentDriver } from '@agi/sdk';
const driver = new AgentDriver({ mode: 'local' });
driver.on('thinking', (text) => console.log('Thinking:', text));
driver.on('action', (action) => console.log('Action:', action.type));
const result = await driver.start('Open calculator and compute 2+2');
console.log('Done:', result.summary);using Agi.Driver;
var driver = new AgentDriver(new DriverOptions { Mode = "local" });
driver.OnThinking += async (text) => Console.WriteLine($"Thinking: {text}");
driver.OnAction += async (action) => Console.WriteLine($"Action: {action.Type}");
var result = await driver.StartAsync(goal: "Open calculator and compute 2+2");
Console.WriteLine($"Done: {result.Summary}");cd src
python -m nuitka \
--standalone --onefile \
--output-filename=agi-driver \
--include-package=agi_driver \
--include-package=agi_driver.environment \
--include-package=anthropic \
--include-package=PIL \
--include-package=pydantic \
--lto=yes \
--python-flag=no_site \
-m agi_driver| OS | Target | Binary |
|---|---|---|
| macOS 14 | darwin-arm64 | agi-driver-darwin-arm64 |
| macOS 13 | darwin-x64 | agi-driver-darwin-x64 |
| Ubuntu 22.04 | linux-x64 | agi-driver-linux-x64 |
| Windows latest | windows-x64 | agi-driver-windows-x64.exe |
anthropic- Claude API clientPillow- Screenshot capturepydantic- Data validation
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AUTONOMOUS LOOP β
β β
β 1. Initialize LocalEnvironment β
β - Detect screen size (system_profiler / powershell / xdpy) β
β - Cache DPI scale factor β
β β
β 2. Capture initial screenshot (PIL.ImageGrab / scrot) β
β βββ Emit screenshot_captured event β
β β
β 3. Call agent.step(screenshot, goal) β
β βββ Prepare image (resize to 1366x768 canvas, JPEG 85) β
β βββ Build messages (goal + history + screenshot) β
β βββ Call Claude API with desktop tools β
β βββ Process response (thinking, tool uses) β
β β
β 4. Emit thinking event β
β β
β 5. Check control flow: β
β βββ finish β Emit finished, exit loop β
β βββ confirm β Emit confirm, wait for stdin response β
β βββ ask_question β Emit ask_question, wait for stdin β
β βββ actions β Continue to step 6 β
β β
β 6. Execute actions on environment β
β βββ Emit action event (informational) β
β βββ Call environment.execute_action() β
β β
β 7. Wait 0.5s settle delay β
β β
β 8. Capture next screenshot β
β βββ Emit screenshot_captured event β
β β
β 9. Go to step 3 β
β β
β Background: stdin reader queues pause/stop/confirm/answer β
β commands for processing between steps β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The following SDK-side executor modules have been removed. The driver binary handles all screenshot capture and action execution in local mode:
| SDK | Removed Module |
|---|---|
| Python | agi.executor (execute_action, execute_actions, get_scale_factor, get_screen_size) |
| Node.js | src/executor.ts (executeAction, executeActions, getScaleFactor, getScreenSize) |
| C# | Agi.Executor (ExecuteAction, ExecuteActions, GetScaleFactor, GetScreenSize) |
All platform-specific code now lives exclusively in the driver binary's environment/local.py.