Package: @tangle/agent-driver v0.1.0
Auditor: Ferdinand (AI)
Date: 2026-02-04
Scope: Architecture, API design, observability, production readiness
This is a clean, minimal LLM-driven browser agent with good bones. The core observeβdecideβexecute loop is well-implemented, and the separation of concerns (Driver, Brain, Runner) shows architectural maturity. However, it's clearly in "MVP/prototype" stageβfine for tests, but missing critical features for production use.
Verdict: Solid foundation. Needs ~2 sprints of hardening for production.
Answer: Mostly, but could be clearer.
What the code does:
for (let i = 1; i <= maxTurns; i++) {
// 1. Observe
// 2. Decide
// 3. Execute
}Each loop iteration is ONE complete cycle. Any action countsβclick, type, scroll, wait, etc.
The ambiguity:
- The JSDoc says
/** Max turns before giving up */β vague - The
Turntype says/** One observe β decide β execute cycle */β better! - Someone might assume "turns" means "user interactions" or "typing turns"
Recommendation:
export interface Scenario {
/**
* Maximum observeβdecideβexecute cycles before aborting.
* Each cycle is one LLM call + one action (click, type, scroll, etc.)
* @default 20
*/
maxTurns?: number;
}Rating: 7/10 β Semantics are correct, documentation could be crisper.
Answer: Yes, it's very flexible.
export interface Scenario {
goal: string; // β
Any natural language goal
startUrl?: string; // β
Optional starting point
maxTurns?: number; // β
Configurable limit
}Strengths:
goalis free-form natural language- No rigid structure imposed
- Works for: "Login as admin", "Add item to cart", "Find the pricing page"
Limitations:
- No support for multi-step scenarios (first do X, then Y)
- No way to pass context/hints (e.g., "the password is in env var")
- No assertion/validation hooks ("verify checkout total is $99")
Recommendation β Add optional context field:
export interface Scenario {
goal: string;
startUrl?: string;
maxTurns?: number;
/** Additional context for the LLM (credentials, hints, etc.) */
context?: string;
/** Expected success criteria for validation */
assertions?: string[];
}Rating: 8/10 β Great for simple goals, needs extension for complex scenarios.
| Data | Where | Notes |
|---|---|---|
| Turn number | Turn.turn |
Good |
| Page state | Turn.state |
URL, title, snapshot |
| Action taken | Turn.action |
Full action object |
| Raw LLM response | Turn.rawLLMResponse |
β Excellent for debugging |
| Duration | Turn.durationMs |
Per-turn timing |
| Errors | Turn.error |
When caught |
| Total time | AgentResult.totalMs |
Aggregate |
| Missing | Impact | Priority |
|---|---|---|
| Conversation history | LLM has no memory of previous turns! | π΄ Critical |
| Screenshots | Can't debug visual issues | π΄ Critical |
| Reasoning/CoT | No visibility into "why" | π‘ High |
| Token usage | Can't track costs | π‘ High |
| Action success/failure | Did click actually work? | π‘ High |
| Retry mechanism | One failure = total abort | π‘ High |
| Structured logging | Only console.log with debug flag |
π’ Medium |
| Trace IDs | Can't correlate across services | π’ Medium |
// brain/index.ts
const response = await this.client.chat.completions.create({
messages: [
{ role: 'system', content: SYSTEM_PROMPT },
{ role: 'user', content: prompt }, // β Only current state!
],
});The LLM has amnesia! Each turn is completely independent. This causes:
- Agent clicks same button repeatedly
- Agent retries failed actions identically
- Agent can't learn from previous attempts
- Multi-step reasoning is impossible
Fix:
class Brain {
private history: ChatCompletionMessageParam[] = [];
async decide(goal: string, state: PageState): Promise<...> {
const userMessage = { role: 'user', content: buildPrompt(goal, state) };
const response = await this.client.chat.completions.create({
messages: [
{ role: 'system', content: SYSTEM_PROMPT },
...this.history,
userMessage,
],
});
// Store for next turn
this.history.push(userMessage);
this.history.push({ role: 'assistant', content: response.choices[0].message.content });
// Trim if too long
if (this.history.length > 20) this.history = this.history.slice(-10);
}
}The PageState only has a text snapshot. For debugging:
- You can't see what the agent "saw"
- You can't verify element visibility
- You can't debug selector issues
Recommendation:
export interface PageState {
url: string;
title: string;
snapshot: string;
screenshot?: Buffer; // Optional, configurable
}} catch (err) {
// Immediate abort, no retry
return { success: false, reason: error, ... };
}One transient failure (network glitch, slow load) = complete failure.
Recommendation:
interface AgentConfig {
retries?: number; // Default: 3
retryDelayMs?: number; // Default: 1000
retryableErrors?: string[]; // Patterns to retry
}Rating: 4/10 β Basic turn logging exists, but critical production features are missing.
export interface Driver {
observe(): Promise<PageState>;
execute(action: Action): Promise<void>;
}Assessment: Clean and minimal.
β
Perfect abstraction level
β
Easy to implement new drivers (Puppeteer, WebDriver, etc.)
β
Testable (easy to mock)
β execute returns void β no feedback on success/failure
β No lifecycle hooks (setup, teardown)
Recommendation:
export interface Driver {
observe(): Promise<PageState>;
execute(action: Action): Promise<ActionResult>; // Did it work?
screenshot?(): Promise<Buffer>;
close?(): Promise<void>;
}
interface ActionResult {
success: boolean;
error?: string;
changedElements?: string[]; // What changed after action
}Currently: Hardcoded OpenAI SDK
import OpenAI from 'openai';
// ...
this.client = new OpenAI({ ... });Can you use Anthropic? Technically yes, via baseUrl pointing to a compatible endpoint. But:
- No native Anthropic SDK support
- No Claude-specific features (extended thinking, tool use)
- OpenAI response format is assumed
Recommendation β Abstract the LLM layer:
interface LLMProvider {
complete(messages: Message[]): Promise<string>;
}
class OpenAIProvider implements LLMProvider { ... }
class AnthropicProvider implements LLMProvider { ... }
class Brain {
constructor(private provider: LLMProvider) {}
}| Aspect | Status | Notes |
|---|---|---|
| Error handling | Single try/catch, no recovery | |
| Graceful shutdown | β Missing | No way to cancel mid-run |
| Resource cleanup | β Missing | Page/browser left open |
| Rate limiting | β Missing | Can hammer the LLM API |
| Circuit breaker | β Missing | No backoff on repeated failures |
| Idempotency | β Missing | Re-running may double-execute |
Rating: 6/10 β Good abstraction, needs production hardening.
- Conversation history β LLM needs context from previous turns
- Screenshot capture β Debug visual state
- Retry mechanism β Handle transient failures
- Abort signal/cancellation β Stop long-running agents
- Action result feedback β Know if actions succeeded
- Structured logging β JSON logs with trace IDs
- Token/cost tracking β Budget awareness
- Multi-LLM support β Anthropic, Gemini, local models
- Hooks/middleware β onBeforeAction, onAfterAction, onError
- State assertions β Verify expected outcomes
- Visual element references β "Click the blue button" not just selectors
- Parallel action support β Fill multiple fields at once
- Record/replay β Capture runs for playback
- Human-in-the-loop β Pause and ask for help
- Metrics export β Prometheus/OpenTelemetry integration
| Category | Rating | Notes |
|---|---|---|
| API Design | 7/10 | Clean, intuitive, good types. Minor gaps in docs. |
| Observability/Debugging | 4/10 | Turn logging is good, but missing screenshots, history, structured logs |
| Extensibility | 6/10 | Driver interface is solid. Brain is not swappable. No hooks. |
| Production Readiness | 3/10 | MVP only. Missing retries, cancellation, conversation history, error recovery |
| Code Quality | 8/10 | Clean, well-organized, proper TypeScript. Good separation of concerns. |
Translation: Great prototype, not production-ready. The bones are goodβthis could be excellent with 2-3 weeks of focused work.
- Add conversation history to Brain
- Add screenshot capture to PageState
- Add retry mechanism to runner
- Add ActionResult feedback from execute()
- Add cancellation/abort signal
- Add structured logging with trace IDs
- Add lifecycle hooks (onTurn, onError, onComplete)
- Add token usage tracking
- Abstract LLM provider interface
- Add Anthropic provider
- Add configuration validation
- Add comprehensive test suite
// brain/index.ts
import type { ChatCompletionMessageParam } from 'openai/resources/chat';
export class Brain {
private history: ChatCompletionMessageParam[] = [];
reset() {
this.history = [];
}
async decide(goal: string, state: PageState): Promise<{ action: Action; raw: string }> {
const userContent = `GOAL: ${goal}\n\nCURRENT PAGE:\nURL: ${state.url}\nTitle: ${state.title}\n\nELEMENTS:\n${state.snapshot}\n\nWhat action should you take?`;
const response = await this.client.chat.completions.create({
model: this.model,
messages: [
{ role: 'system', content: SYSTEM_PROMPT },
...this.history,
{ role: 'user', content: userContent },
],
temperature: 0,
max_tokens: 200,
});
const raw = response.choices[0]?.message?.content || '';
// Persist conversation
this.history.push({ role: 'user', content: userContent });
this.history.push({ role: 'assistant', content: raw });
// Trim old history to avoid context overflow
if (this.history.length > 16) {
this.history = this.history.slice(-12);
}
return { action: this.parse(raw), raw };
}
}// drivers/playwright.ts
export class PlaywrightDriver implements Driver {
async observe(): Promise<PageState> {
const [url, title, snapshot, screenshot] = await Promise.all([
this.page.url(),
this.page.title(),
this.extractSnapshot(),
this.options.captureScreenshots
? this.page.screenshot({ type: 'jpeg', quality: 50 })
: undefined,
]);
return { url, title, snapshot, screenshot };
}
}// runner.ts
async function withRetry<T>(
fn: () => Promise<T>,
retries: number = 3,
delayMs: number = 1000
): Promise<T> {
let lastError: Error | undefined;
for (let i = 0; i < retries; i++) {
try {
return await fn();
} catch (err) {
lastError = err instanceof Error ? err : new Error(String(err));
if (i < retries - 1) {
await new Promise(r => setTimeout(r, delayMs * (i + 1)));
}
}
}
throw lastError;
}@tangle/agent-driver is a well-designed prototype that demonstrates good architectural instincts. The observeβdecideβexecute loop is clean, the types are well-thought-out, and the code is readable.
However, it's missing several table-stakes features for production:
- Conversation history (the LLM is currently amnesiac!)
- Screenshot capture
- Retry logic
- Structured observability
The good news: the foundation is solid enough that these can be added incrementally without major refactoring.
Recommendation: Add conversation history first (it's a ~20-line fix that dramatically improves agent behavior), then tackle screenshots and retries before any production use.
Audit conducted by Ferdinand β’ @tangle/agent-driver v0.1.0