Skip to content

Instantly share code, notes, and snippets.

@JacobFV
Last active February 5, 2026 20:16
Show Gist options
  • Select an option

  • Save JacobFV/dbbd8de043f56942681489069ddb1666 to your computer and use it in GitHub Desktop.

Select an option

Save JacobFV/dbbd8de043f56942681489069ddb1666 to your computer and use it in GitHub Desktop.
AGI SDK Binary Architecture - Claude Code Pattern

AGI SDK Binary Architecture - Claude Code Pattern

AGI SDK Binary Architecture

This document describes the AGI driver binary and SDK architecture. The driver is a self-contained agent that captures screenshots, reasons with Claude, and executes actions autonomously. SDKs are thin event wrappers.

Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        agi-api-driver                                    β”‚
β”‚                    (locally: ~/Code/agi-api-driver)                      β”‚
β”‚                                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  SELF-CONTAINED AGENT DRIVER BINARY                               β”‚  β”‚
β”‚  β”‚                                                                   β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β”‚  β”‚  Executor    β”‚  β”‚  Agent/LLM   β”‚  β”‚  Environment           β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β€’ State     β”‚  β”‚  β€’ Claude    β”‚  β”‚  β€’ Screenshot capture  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚    machine   β”‚  β”‚    API       β”‚  β”‚  β€’ Action execution    β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β€’ Event     β”‚  β”‚  β€’ Tools    β”‚  β”‚  β€’ Screen size detect  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚    emission  β”‚  β”‚  β€’ Prompts  β”‚  β”‚  β€’ DPI/scale factor    β”‚  β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                            ↓ CI compiles & publishes                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  GitHub Releases: agi-driver-v1.0.0-{platform}                    β”‚  β”‚
β”‚  β”‚  darwin-arm64 | darwin-x64 | linux-x64 | windows-x64              β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            ↓                       ↓                       ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   agi-python     β”‚  β”‚    agi-node      β”‚  β”‚   agi-csharp     β”‚
β”‚                  β”‚  β”‚                  β”‚  β”‚                  β”‚
β”‚  THIN WRAPPER    β”‚  β”‚  THIN WRAPPER    β”‚  β”‚  THIN WRAPPER    β”‚
β”‚  β€’ Spawn binary  β”‚  β”‚  β€’ Spawn binary  β”‚  β”‚  β€’ Spawn binary  β”‚
β”‚  β€’ Event hooks   β”‚  β”‚  β€’ Event hooks   β”‚  β”‚  β€’ Event hooks   β”‚
β”‚  β€’ Send commands β”‚  β”‚  β€’ Send commands β”‚  β”‚  β€’ Send commands β”‚
β”‚  (no platform    β”‚  β”‚  (no platform    β”‚  β”‚  (no platform    β”‚
β”‚   code needed)   β”‚  β”‚   code needed)   β”‚  β”‚   code needed)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Two Operating Modes

Local Mode (Self-Contained) - mode: "local"

The driver runs autonomously on the local machine:

SDK sends:  {"command":"start","goal":"Open calculator","mode":"local"}
Driver:     1. Captures screenshot via Pillow/scrot
            2. Calls Claude API with screenshot + goal
            3. Emits thinking/action events (informational)
            4. Executes actions locally (JXA/PowerShell/xdotool)
            5. Waits 0.5s for screen to settle
            6. Captures next screenshot
            7. Repeat until finished/error
SDK:        Just listens to events. No platform code needed.

Legacy Mode (SDK-Driven) - mode: ""

The SDK manages screenshots and action execution (existing behavior):

SDK sends:  {"command":"start","goal":"...","screenshot":"base64...","screen_width":1920,"screen_height":1080}
Driver:     1. Receives screenshot from SDK
            2. Calls Claude API
            3. Emits action events
SDK:        4. Executes actions
            5. Captures screenshot
            6. Sends screenshot command
            7. Repeat

Architecture: Self-Contained Driver Binary

Key Changes from Previous Design

Aspect Before After
Screenshot capture SDK responsibility Driver captures locally
Action execution SDK responsibility Driver executes locally
Screen detection SDK responsibility Driver detects automatically
SDK executor code Required (600+ lines per SDK) Deprecated (kept for backward compat)
SDK role I/O adapter + executor Pure event wrapper
Platform code Duplicated in 3 SDKs Single implementation in driver

Environment Module

The driver includes a platform-aware environment module:

agi_driver/environment/
β”œβ”€β”€ __init__.py     # Factory: create_environment("local")
β”œβ”€β”€ base.py         # Abstract: BaseEnvironment
└── local.py        # LocalEnvironment - controls local machine

BaseEnvironment interface:

class BaseEnvironment(ABC):
    async def initialize(self) -> None
    async def capture_screenshot(self) -> tuple[str, int, int]  # (base64, width, height)
    async def execute_action(self, action: dict) -> bool
    async def get_screen_size(self) -> tuple[int, int]
    async def cleanup(self) -> None

LocalEnvironment handles:

  • Screenshot: PIL.ImageGrab.grab() (macOS/Windows), scrot (Linux)
  • Clicks: JXA/CGEvent (macOS), PowerShell/user32.dll (Windows), xdotool (Linux)
  • Typing: JXA with JSON escaping (macOS), Base64+SendKeys (Windows), xdotool (Linux)
  • Keys: AppleScript key codes (macOS), SendKeys format (Windows), xdotool (Linux)
  • Scroll/Drag: Platform-specific implementations
  • DPI/Scale: NSScreen (macOS), Registry (Windows), GDK_SCALE (Linux)

Event-Driven Protocol

Binary -> SDK (stdout events)

{"event":"ready","version":"0.1.0","protocol":"jsonl","step":0}
{"event":"state_change","state":"running","step":0}
{"event":"screenshot_captured","width":3024,"height":1964,"step":0}
{"event":"thinking","text":"I see the desktop with a dock at the bottom...","step":1}
{"event":"action","action":{"type":"click","x":150,"y":200},"step":1}
{"event":"screenshot_captured","width":3024,"height":1964,"step":1}
{"event":"confirm","action":{},"reason":"Delete this file?","step":2}
{"event":"ask_question","question":"What email should I use?","question_id":"q1","step":3}
{"event":"finished","reason":"completed","summary":"Opened calculator and computed 2+2=4","success":true,"step":10}
{"event":"error","message":"Model inference failed","code":"step_error","recoverable":true,"step":5}

New event: screenshot_captured - Emitted in local mode when the driver captures a screenshot. Lightweight notification (no image data) so SDKs know a step boundary occurred.

SDK -> Binary (stdin commands)

{"command":"start","session_id":"sess_abc","goal":"Open calculator","mode":"local"}
{"command":"start","session_id":"sess_def","goal":"Click login","screenshot":"base64...","screen_width":1920,"screen_height":1080}
{"command":"screenshot","data":"base64...","screen_width":1920,"screen_height":1080}
{"command":"pause"}
{"command":"resume"}
{"command":"stop","reason":"User cancelled"}
{"command":"confirm","approved":true,"message":""}
{"command":"answer","text":"user@example.com","question_id":"q1"}

StartCommand changes:

  • mode field added: "local" for autonomous, "" for legacy
  • screenshot, screen_width, screen_height are ignored in local mode

State Machine

                                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                    β”‚                     β”‚
                    start           β–Ό                     β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β” ─────────────> β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”‚
    β”‚ IDLE  β”‚                β”‚   RUNNING   β”‚<─────────────
    β””β”€β”€β”€β”€β”€β”€β”€β”˜                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚
                                    β”‚                     β”‚
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
            β”‚                       β”‚                   β”‚ β”‚
            β–Ό                       β–Ό                   β–Ό β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚    PAUSED    β”‚    β”‚ WAITING_CONFIRM   β”‚   β”‚ WAITING_ANSWER β”‚
    β”‚              β”‚    β”‚                   β”‚   β”‚                β”‚
    β”‚  resume()    β”‚    β”‚   confirm(bool)   β”‚   β”‚   answer(str)  β”‚
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚                      β”‚                     β”‚
           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

    ANY STATE ──── stop() ────> STOPPED
    ANY STATE ──── error ─────> ERROR
    RUNNING ────── finish ───> FINISHED

Repository Structure

agi-api-driver/

agi-api-driver/
β”œβ”€β”€ src/agi_driver/
β”‚   β”œβ”€β”€ __init__.py              # Package exports, version
β”‚   β”œβ”€β”€ __main__.py              # CLI entry point
β”‚   β”œβ”€β”€ executor.py              # Main execution loop (local + legacy modes)
β”‚   β”œβ”€β”€ state_machine.py         # State enum and transitions
β”‚   β”œβ”€β”€ agent/
β”‚   β”‚   β”œβ”€β”€ base.py              # BaseDriverAgent, AgentAction, StepResult
β”‚   β”‚   β”œβ”€β”€ desktop_agent.py     # DesktopAgent (Claude API integration)
β”‚   β”‚   β”œβ”€β”€ prompt.py            # System prompts
β”‚   β”‚   └── tools.py             # Desktop automation tool definitions
β”‚   β”œβ”€β”€ environment/             # NEW: Self-contained environment
β”‚   β”‚   β”œβ”€β”€ __init__.py          # Factory: create_environment()
β”‚   β”‚   β”œβ”€β”€ base.py              # Abstract BaseEnvironment
β”‚   β”‚   └── local.py             # LocalEnvironment (screenshot + actions)
β”‚   β”œβ”€β”€ llm/
β”‚   β”‚   └── anthropic.py         # Claude API client with retry
β”‚   └── protocol/
β”‚       β”œβ”€β”€ commands.py          # 7 command types (start now has mode)
β”‚       β”œβ”€β”€ events.py            # 9 event types (+ screenshot_captured)
β”‚       └── jsonl.py             # JSON Lines I/O
β”œβ”€β”€ .github/workflows/
β”‚   └── build-agi-driver.yml     # Cross-platform Nuitka build
└── pyproject.toml

SDK Integration

Python SDK - Local Mode (New, Simplified)

from agi import AgentDriver, DriverOptions

driver = AgentDriver(DriverOptions(mode="local"))

driver.on_thinking(lambda t: print(f"Thinking: {t}"))
driver.on_action(lambda a: print(f"Action: {a.type}"))  # Informational only

result = await driver.start(goal="Open calculator and compute 2+2")
print(f"Done: {result.summary}")

Node.js SDK - Local Mode

import { AgentDriver } from '@agi/sdk';

const driver = new AgentDriver({ mode: 'local' });

driver.on('thinking', (text) => console.log('Thinking:', text));
driver.on('action', (action) => console.log('Action:', action.type));

const result = await driver.start('Open calculator and compute 2+2');
console.log('Done:', result.summary);

C# SDK - Local Mode

using Agi.Driver;

var driver = new AgentDriver(new DriverOptions { Mode = "local" });

driver.OnThinking += async (text) => Console.WriteLine($"Thinking: {text}");
driver.OnAction += async (action) => Console.WriteLine($"Action: {action.Type}");

var result = await driver.StartAsync(goal: "Open calculator and compute 2+2");
Console.WriteLine($"Done: {result.Summary}");

Build & Distribution

Nuitka Compilation

cd src
python -m nuitka \
  --standalone --onefile \
  --output-filename=agi-driver \
  --include-package=agi_driver \
  --include-package=agi_driver.environment \
  --include-package=anthropic \
  --include-package=PIL \
  --include-package=pydantic \
  --lto=yes \
  --python-flag=no_site \
  -m agi_driver

Platform Matrix

OS Target Binary
macOS 14 darwin-arm64 agi-driver-darwin-arm64
macOS 13 darwin-x64 agi-driver-darwin-x64
Ubuntu 22.04 linux-x64 agi-driver-linux-x64
Windows latest windows-x64 agi-driver-windows-x64.exe

Dependencies Bundled in Binary

  • anthropic - Claude API client
  • Pillow - Screenshot capture
  • pydantic - Data validation

Autonomous Loop (Local Mode) Detail

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    AUTONOMOUS LOOP                              β”‚
β”‚                                                                β”‚
β”‚  1. Initialize LocalEnvironment                                β”‚
β”‚     - Detect screen size (system_profiler / powershell / xdpy) β”‚
β”‚     - Cache DPI scale factor                                   β”‚
β”‚                                                                β”‚
β”‚  2. Capture initial screenshot (PIL.ImageGrab / scrot)         β”‚
β”‚     └── Emit screenshot_captured event                         β”‚
β”‚                                                                β”‚
β”‚  3. Call agent.step(screenshot, goal)                           β”‚
β”‚     β”œβ”€β”€ Prepare image (resize to 1366x768 canvas, JPEG 85)    β”‚
β”‚     β”œβ”€β”€ Build messages (goal + history + screenshot)           β”‚
β”‚     β”œβ”€β”€ Call Claude API with desktop tools                     β”‚
β”‚     └── Process response (thinking, tool uses)                 β”‚
β”‚                                                                β”‚
β”‚  4. Emit thinking event                                        β”‚
β”‚                                                                β”‚
β”‚  5. Check control flow:                                        β”‚
β”‚     β”œβ”€β”€ finish β†’ Emit finished, exit loop                      β”‚
β”‚     β”œβ”€β”€ confirm β†’ Emit confirm, wait for stdin response        β”‚
β”‚     β”œβ”€β”€ ask_question β†’ Emit ask_question, wait for stdin       β”‚
β”‚     └── actions β†’ Continue to step 6                           β”‚
β”‚                                                                β”‚
β”‚  6. Execute actions on environment                             β”‚
β”‚     β”œβ”€β”€ Emit action event (informational)                      β”‚
β”‚     └── Call environment.execute_action()                      β”‚
β”‚                                                                β”‚
β”‚  7. Wait 0.5s settle delay                                     β”‚
β”‚                                                                β”‚
β”‚  8. Capture next screenshot                                    β”‚
β”‚     └── Emit screenshot_captured event                         β”‚
β”‚                                                                β”‚
β”‚  9. Go to step 3                                               β”‚
β”‚                                                                β”‚
β”‚  Background: stdin reader queues pause/stop/confirm/answer     β”‚
β”‚  commands for processing between steps                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Removed: SDK-Side Executor Modules

The following SDK-side executor modules have been removed. The driver binary handles all screenshot capture and action execution in local mode:

SDK Removed Module
Python agi.executor (execute_action, execute_actions, get_scale_factor, get_screen_size)
Node.js src/executor.ts (executeAction, executeActions, getScaleFactor, getScreenSize)
C# Agi.Executor (ExecuteAction, ExecuteActions, GetScaleFactor, GetScreenSize)

All platform-specific code now lives exclusively in the driver binary's environment/local.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment