Generated: 2026-02-11
Source: Research for issue conductorbot-6cr
Note: This document covers OpenClaw's retry infrastructure and error handling. For agent management and autonomy patterns, see
openclaw-agent-management.md.
File: /Users/ajsharp/code/github/openclaw/src/commands/agent.ts
The agent harness wraps agent execution through the agentCommand function, which:
- Validates input and session parameters
- Resolves workspace, agent configuration, and model selection
- Delegates to either CLI agents or embedded Pi agents
- Handles model fallback and delivery
File: /Users/ajsharp/code/github/openclaw/src/agents/pi-embedded-runner/run.ts
The runEmbeddedPiAgent function is the main orchestration layer:
export async function runEmbeddedPiAgent(
params: RunEmbeddedPiAgentParams,
): Promise<EmbeddedPiRunResult>Key responsibilities:
- Session lane management - Uses queue-based execution lanes to prevent concurrent runs
- Workspace resolution - Determines working directory with fallback logic
- Auth profile management - Rotates through API keys/profiles when rate-limited
- Retry orchestration - Coordinates multiple retry strategies
- Result building - Constructs payloads from assistant responses and tool calls
The architecture implements three concurrent retry loops:
Location: runEmbeddedPiAgent (lines 357-384)
while (profileIndex < profileCandidates.length) {
const candidate = profileCandidates[profileIndex];
if (candidate && isProfileInCooldown(authStore, candidate)) {
profileIndex += 1;
continue;
}
await applyApiKeyInfo(profileCandidates[profileIndex]);
break;
}Termination: Exhausts all available auth profiles or finds one not in cooldown
Location: runEmbeddedPiAgent (lines 392-863)
const MAX_OVERFLOW_COMPACTION_ATTEMPTS = 3;
let overflowCompactionAttempts = 0;
while (true) {
attemptedThinking.add(thinkLevel);
const attempt = await runEmbeddedAttempt({ ... });
// Handle context overflow with auto-compaction
if (contextOverflowError) {
if (!isCompactionFailure &&
overflowCompactionAttempts < MAX_OVERFLOW_COMPACTION_ATTEMPTS) {
overflowCompactionAttempts++;
const compactResult = await compactEmbeddedPiSessionDirect({ ... });
if (compactResult.compacted) {
continue; // Retry after compaction
}
}
// Try tool result truncation as last resort
if (!toolResultTruncationAttempted) {
const truncResult = await truncateOversizedToolResultsInSession({ ... });
if (truncResult.truncated) {
overflowCompactionAttempts = 0;
continue;
}
}
return { /* context overflow error */ };
}
// Handle auth failures with profile rotation
if (shouldRotate) {
if (lastProfileId) {
await markAuthProfileFailure({ ... });
}
const rotated = await advanceAuthProfile();
if (rotated) {
continue;
}
}
// Success path
return { payloads, meta };
}Termination conditions:
- Successful completion
- All auth profiles exhausted
- Context overflow unrecoverable
- Non-retryable error (AbortError, image errors, role ordering conflicts)
File: /Users/ajsharp/code/github/openclaw/src/agents/model-fallback.ts
export async function runWithModelFallback<T>(params: {
cfg: OpenClawConfig | undefined;
provider: string;
model: string;
fallbacksOverride?: string[];
run: (provider: string, model: string) => Promise<T>;
}): Promise<{ result: T; provider: string; model: string; attempts: FallbackAttempt[] }>Retry logic:
for (let i = 0; i < candidates.length; i += 1) {
const candidate = candidates[i];
try {
const result = await params.run(candidate.provider, candidate.model);
return { result, provider: candidate.provider, model: candidate.model, attempts };
} catch (err) {
if (shouldRethrowAbort(err)) throw err;
const normalized = coerceToFailoverError(err, { ... });
if (!isFailoverError(normalized)) throw err;
lastError = normalized;
attempts.push({ /* failed attempt */ });
}
}Tool: /Users/ajsharp/code/github/openclaw/src/agents/tools/sessions-spawn-tool.ts
Agents spawn isolated subagents via the sessions_spawn tool:
const childSessionKey = `agent:${targetAgentId}:subagent:${crypto.randomUUID()}`;
await callGateway({
method: "agent",
params: {
sessionKey: childSessionKey,
message: task,
deliver: false,
model: subagentModel,
thinking: subagentThinking,
systemPrompt: buildSubagentSystemPrompt(task),
},
});File: /Users/ajsharp/code/github/openclaw/src/agents/subagent-announce.ts
export async function runSubagentAnnounceFlow(params: {
childSessionKey: string;
childRunId: string;
requesterSessionKey: string;
task: string;
timeoutMs: number;
cleanup: "delete" | "keep";
}): Promise<boolean>Three delivery modes:
- Steer - Inject message into active parent run (real-time)
- Queue - Enqueue for delivery after current run completes
- Interrupt - Force delivery for urgent notifications
type FailoverReason =
| "auth" // Authentication failure → rotate profiles
| "rate_limit" // Rate limiting → cooldown + rotate
| "billing" // Quota exceeded → skip provider
| "timeout" // Request timeout → retry or fallback
| "context_overflow" // Prompt too large → compaction
| "image_size" // Image too large → user error
| "role_ordering" // Message ordering violation → user error
| "unknown"; // Unclassified → fallback| Error Type | Primary Recovery | Secondary Recovery | Final Fallback |
|---|---|---|---|
auth |
Rotate API key profile | Mark profile failed + cooldown | Model fallback |
rate_limit |
Rotate to next profile | Wait for cooldown | Model fallback |
context_overflow |
Auto-compact session (max 3x) | Truncate tool results | User error message |
timeout |
Retry same model | None | Model fallback |
billing |
Skip provider entirely | None | Model fallback |
image_size |
None | None | User error message |
role_ordering |
None | None | User error message |
- Multi-level retry loops - Auth rotation, execution retries, model fallback
- Context overflow auto-compaction - Prevent hitting token limits
- Error classification system - Structured recovery based on error type
- Model fallback configuration - YAML-defined fallback chains
- Parallel step execution - For independent workflow steps
- Subagent announce flow - Async step completion notifications
- Agent-driven task decomposition - Conflicts with declarative YAML paradigm
- Session file persistence - SQLite already handles this
- Gateway architecture - Not needed for single-tenant setup
~/code/github/openclaw/src/agents/pi-embedded-runner/run.ts(retry orchestration)~/code/github/openclaw/src/agents/model-fallback.ts(model fallback)~/code/github/openclaw/src/agents/pi-embedded-helpers.ts(error classification)~/code/github/openclaw/src/agents/subagent-announce.ts(async result delivery)
claude-conductor/src/core/workflow-engine.ts(add retries)claude-conductor/src/core/context-store.ts(add compaction)claude-conductor/src/providers/provider.ts(wrap with fallback)claude-conductor/src/schemas/workflow-schema.ts(add retry config)