RLM — BEAM-Native Implementation Plan for Jido.AI

Implements the Recursive Language Model pattern using native Jido/BEAM semantics. No regex parsing. No Python subprocess. No exec().

Supersedes RLM_STRATEGY.md. See RLM_STRATEGY_CRITIQUE.md for rationale.

Core Insight

The RLM paper's contribution is a methodology — iterative context exploration with sub-LLM delegation. The Python REPL is an implementation detail. Every RLM primitive maps cleanly to an existing BEAM/Jido concept:

RLM (Python)	BEAM/Jido Native
REPL persistent variables	Workspace state — ETS table keyed by `{request_id, :workspace}`
`exec()` code blocks	Typed tool calls — Jido Actions via `Directive.ToolExec`
`llm_query()` sub-calls	`llm.subquery_batch` Action — `Task.async_stream` under agent's TaskSupervisor
Regex response parsing	API-native tool_calls — ReqLLM extracts structured `ToolCall` structs
`FINAL()` marker	Standard final answer — ReAct machine's `:final_answer` type
Iteration loop	ReAct FSM — `awaiting_llm ↔ awaiting_tool` cycle
Context in temp file	ContextStore — ETS or process state, accessed by reference

Architecture

RLM is implemented as a strategy adapter that composes with the existing ReAct.Machine, providing RLM-specific tools, prompts, and context/workspace management. No machine fork needed.

┌──────────────────────────────────────────────────────────┐
│  Jido.AI.Strategies.RLM  (Strategy adapter)              │
│    │                                                      │
│    ├── Jido.AI.ReAct.Machine  (reused, unmodified)        │
│    │     └── idle → awaiting_llm → awaiting_tool → ...    │
│    │                                                      │
│    ├── Jido.AI.RLM.Prompts                                │
│    │     ├── system_prompt/1  (exploration methodology)    │
│    │     └── next_step_prompt/1  (per-iteration guidance)  │
│    │                                                      │
│    ├── Jido.AI.RLM.ContextStore                           │
│    │     └── store/fetch/delete context by reference       │
│    │                                                      │
│    ├── Jido.AI.RLM.WorkspaceStore                         │
│    │     └── per-request exploration state (ETS)           │
│    │                                                      │
│    ├── RLM Exploration Tools (Jido Actions)                │
│    │     ├── Context.Stats                                 │
│    │     ├── Context.Chunk                                 │
│    │     ├── Context.ReadChunk                             │
│    │     ├── Context.Search                                │
│    │     ├── Workspace.Note                                │
│    │     ├── Workspace.GetSummary                          │
│    │     └── LLM.SubqueryBatch                             │
│    │                                                      │
│    └── Directive lifting (LLMStream + ToolExec)            │
│         └── injects next_step_prompt + workspace summary   │
└──────────────────────────────────────────────────────────┘

Why Reuse ReAct.Machine (Not Fork)

The ReAct machine already solves:

Tool call correlation IDs and parallel execution
Iteration limits with max_iterations guard
Busy rejection (EmitRequestError)
Deadlock avoidance (EmitToolError for unknown tools)
Streaming token accumulation
Thread-based conversation history
Usage tracking and telemetry

RLM needs different tools and prompts, not different states. The strategy adapter layer handles the differences:

Injects RLM system prompt instead of generic ReAct prompt
Appends workspace-aware next_step_prompt before each LLM call
Manages context/workspace lifecycle around the machine

This is the same pattern TRM uses: TRM routes react.llm.response signals to its own action atoms and builds phase-specific prompts, while reusing LLMStream directives.

Module Layout

lib/jido_ai/
├── strategies/
│   └── rlm.ex                              # Strategy adapter
├── rlm/
│   ├── context_store.ex                     # Context storage (ETS/inline)
│   ├── workspace_store.ex                   # Per-request exploration state
│   └── prompts.ex                           # System + per-iteration prompts
├── actions/rlm/
│   ├── context/
│   │   ├── stats.ex                         # Context size/type info
│   │   ├── chunk.ex                         # Chunking strategy
│   │   ├── read_chunk.ex                    # Fetch chunk text
│   │   └── search.ex                        # Substring/regex search
│   ├── workspace/
│   │   ├── note.ex                          # Record hypothesis/finding
│   │   └── get_summary.ex                   # Compact workspace summary
│   └── llm/
│       └── subquery_batch.ex                # Concurrent sub-LLM delegation
├── agents/strategies/
│   └── rlm_agent.ex                         # Agent macro
└── agents/examples/
    └── needle_haystack_agent.ex             # Example agent

scripts/
└── test_rlm_agent.exs                       # Runnable demo

1. ContextStore — `Jido.AI.RLM.ContextStore`

Stores the large input context and returns a reference that tools use to access it. Prevents copying GB-scale data through signals and messages.

Storage Tiers

Size	Backend	Reference
< 2 MB (configurable)	Inline in `run_tool_context`	`%{backend: :inline, data: binary}`
2 MB – 200 MB	Private ETS table (owned by agent process)	`%{backend: :ets, table: tid, key: {request_id, :context}, size_bytes: n}`
> 200 MB (v2)	Temp file	`%{backend: :file, path: "...", size_bytes: n}`

API

defmodule Jido.AI.RLM.ContextStore do
  @type context_ref :: %{backend: :inline | :ets | :file, ...}

  @spec put(binary(), String.t(), keyword()) :: {:ok, context_ref()}
  def put(context, request_id, opts \\ [])

  @spec fetch(context_ref()) :: {:ok, binary()} | {:error, :not_found}
  def fetch(context_ref)

  @spec fetch_range(context_ref(), non_neg_integer(), non_neg_integer()) :: {:ok, binary()}
  def fetch_range(context_ref, byte_offset, length)

  @spec delete(context_ref()) :: :ok
  def delete(context_ref)

  @spec size(context_ref()) :: non_neg_integer()
  def size(context_ref)
end

ETS table is private, owned by the agent process — automatically freed on process crash/termination. No manual cleanup needed for the crash case.

2. WorkspaceStore — `Jido.AI.RLM.WorkspaceStore`

Per-request exploration state that tools read from and write to. This is the BEAM equivalent of "REPL persistent variables" — explicit, typed, inspectable.

Shape

%{
  query: "Find the magic number",
  context_ref: %{backend: :ets, ...},
  chunks: %{
    strategy: :lines,
    size: 1000,
    index: %{"c_0" => %{byte_start: 0, byte_end: 12345, lines: "1-1000"}, ...}
  },
  hits: [
    %{chunk_id: "c_47", offset: 47231, snippet: "The magic number is 1298418"}
  ],
  notes: [
    %{kind: :hypothesis, text: "Magic number appears in middle third", at: ~U[...]}
  ],
  subquery_results: [
    %{chunk_id: "c_47", model: "anthropic:claude-haiku-4-5", answer: "1298418"}
  ]
}

API

defmodule Jido.AI.RLM.WorkspaceStore do
  @spec init(String.t(), map()) :: {:ok, workspace_ref()}
  def init(request_id, seed \\ %{})

  @spec get(workspace_ref()) :: map()
  def get(workspace_ref)

  @spec update(workspace_ref(), (map() -> map())) :: :ok
  def update(workspace_ref, fun)

  @spec summary(workspace_ref(), keyword()) :: String.t()
  def summary(workspace_ref, opts \\ [])

  @spec delete(workspace_ref()) :: :ok
  def delete(workspace_ref)
end

Storage: ETS keyed by {request_id, :workspace}. Same table as ContextStore (one private ETS table per agent for all RLM data).

How the LLM Sees Workspace

The LLM sees workspace state through two channels:

Tool results — each tool returns its results as structured data, which the ReAct machine appends to the Thread as tool_result messages
Next-step prompt — before each LLM call, the strategy injects a user message containing WorkspaceStore.summary/2 (compact text: "You've searched 3 chunks, found 1 hit, have 2 hypotheses...")

This replaces the Python pattern of the model accessing REPL variables directly.

3. Exploration Tools (Jido Actions)

Each tool is a standard Jido.Action — typed schema, run/2 function, composable via ToolAdapter. All tools receive context_ref and workspace_ref through the tool execution context (the context argument to run/2), which comes from run_tool_context.

3.1 `Jido.AI.Actions.RLM.Context.Stats`

use Jido.Action,
  name: "context_stats",
  description: "Get size and structure information about the loaded context",
  schema: Zoi.object(%{})

def run(_params, context) do
  ref = context.context_ref
  size = ContextStore.size(ref)
  sample = ContextStore.fetch_range(ref, 0, min(500, size))
  {:ok, %{size_bytes: size, approx_lines: estimate_lines(size, sample), encoding: detect_encoding(sample)}}
end

3.2 `Jido.AI.Actions.RLM.Context.Chunk`

use Jido.Action,
  name: "context_chunk",
  description: "Split context into chunks and index them for exploration",
  schema: Zoi.object(%{
    strategy: Zoi.enum(["lines", "bytes"]) |> Zoi.default("lines"),
    size: Zoi.integer() |> Zoi.default(1000),
    overlap: Zoi.integer() |> Zoi.default(0),
    max_chunks: Zoi.integer() |> Zoi.default(500),
    preview_bytes: Zoi.integer() |> Zoi.default(100)
  })

def run(params, context) do
  # Compute chunk boundaries from context_ref
  # Store chunk index in workspace
  # Return bounded list of chunk descriptors with previews
  {:ok, %{chunk_count: n, chunks: [%{id: "c_0", lines: "1-1000", preview: "..."}]}}
end

3.3 `Jido.AI.Actions.RLM.Context.ReadChunk`

use Jido.Action,
  name: "context_read_chunk",
  description: "Read the text content of a specific chunk",
  schema: Zoi.object(%{
    chunk_id: Zoi.string(),
    max_bytes: Zoi.integer() |> Zoi.default(50_000)
  })

def run(params, context) do
  # Look up chunk boundaries from workspace
  # Fetch text from context_ref using byte range
  {:ok, %{chunk_id: params.chunk_id, text: chunk_text, truncated: false}}
end

3.4 `Jido.AI.Actions.RLM.Context.Search`

use Jido.Action,
  name: "context_search",
  description: "Search the context for a substring or regex pattern",
  schema: Zoi.object(%{
    query: Zoi.string(),
    mode: Zoi.enum(["substring", "regex"]) |> Zoi.default("substring"),
    limit: Zoi.integer() |> Zoi.default(20),
    window_bytes: Zoi.integer() |> Zoi.default(200)
  })

def run(params, context) do
  # Search context_ref for matches
  # Store hits in workspace
  # Return hits with surrounding context snippets
  {:ok, %{total_matches: n, hits: [%{offset: 47231, chunk_id: "c_47", snippet: "..."}]}}
end

3.5 `Jido.AI.Actions.RLM.Workspace.Note`

use Jido.Action,
  name: "workspace_note",
  description: "Record a hypothesis, finding, or plan in the exploration workspace",
  schema: Zoi.object(%{
    text: Zoi.string(),
    kind: Zoi.enum(["hypothesis", "finding", "plan"]) |> Zoi.default("finding")
  })

def run(params, context) do
  WorkspaceStore.update(context.workspace_ref, fn ws ->
    Map.update(ws, :notes, [note], &(&1 ++ [note]))
  end)
  summary = WorkspaceStore.summary(context.workspace_ref)
  {:ok, %{recorded: true, workspace_summary: summary}}
end

3.6 `Jido.AI.Actions.RLM.Workspace.GetSummary`

use Jido.Action,
  name: "workspace_summary",
  description: "Get a compact summary of exploration progress so far",
  schema: Zoi.object(%{
    max_chars: Zoi.integer() |> Zoi.default(2000)
  })

def run(params, context) do
  summary = WorkspaceStore.summary(context.workspace_ref, max_chars: params.max_chars)
  {:ok, %{summary: summary}}
end

3.7 `Jido.AI.Actions.RLM.LLM.SubqueryBatch`

The BEAM advantage. Fan out sub-LLM calls concurrently under the agent's TaskSupervisor.

use Jido.Action,
  name: "llm_subquery_batch",
  description: "Run a sub-LLM query across multiple chunks concurrently. Use for map-reduce style analysis.",
  schema: Zoi.object(%{
    chunk_ids: Zoi.list(Zoi.string()),
    prompt: Zoi.string(),
    model: Zoi.string() |> Zoi.optional(),
    max_concurrency: Zoi.integer() |> Zoi.default(10),
    timeout: Zoi.integer() |> Zoi.default(60_000),
    max_chunk_bytes: Zoi.integer() |> Zoi.default(50_000)
  })

def run(params, context) do
  model = params[:model] || context[:recursive_model] || "anthropic:claude-haiku-4-5"
  workspace = WorkspaceStore.get(context.workspace_ref)

  results =
    params.chunk_ids
    |> Task.async_stream(
      fn chunk_id ->
        chunk_text = fetch_chunk_text(chunk_id, workspace, context.context_ref, params.max_chunk_bytes)
        prompt = "#{params.prompt}\n\nContext:\n#{chunk_text}"
        case ReqLLM.Generation.generate_text(model, prompt, []) do
          {:ok, response} -> {:ok, %{chunk_id: chunk_id, answer: response.text}}
          {:error, reason} -> {:error, %{chunk_id: chunk_id, error: inspect(reason)}}
        end
      end,
      max_concurrency: params.max_concurrency,
      timeout: params.timeout,
      on_timeout: :kill_task
    )
    |> Enum.map(fn
      {:ok, result} -> result
      {:exit, :timeout} -> {:error, %{error: "timeout"}}
      {:exit, reason} -> {:error, %{error: inspect(reason)}}
    end)

  # Store results in workspace
  WorkspaceStore.update(context.workspace_ref, fn ws ->
    Map.update(ws, :subquery_results, results, &(&1 ++ results))
  end)

  successes = Enum.filter(results, &match?({:ok, _}, &1)) |> Enum.map(&elem(&1, 1))
  errors = Enum.filter(results, &match?({:error, _}, &1)) |> length()

  {:ok, %{completed: length(successes), errors: errors, results: successes}}
end

4. Prompts — `Jido.AI.RLM.Prompts`

Two prompt builders, following the pattern from Jido.AI.TRM.Reasoning.

System Prompt

Teaches the LLM the exploration methodology and available tools. No mention of code blocks or REPL — the LLM uses standard tool calling.

def system_prompt(config) do
  tools_desc = format_tool_descriptions(config.tools)

  """
  You are a data analyst exploring a large context to answer a user's question.

  You have access to a workspace that persists across iterations. Use the available
  tools to systematically explore the context and build toward an answer.

  ## Available Tools
  #{tools_desc}

  ## Methodology
  1. Start by checking context stats to understand size and structure
  2. Create a chunking plan appropriate for the context size
  3. Search for relevant patterns, or delegate analysis to sub-LLM queries
  4. Record hypotheses and findings in the workspace
  5. When confident, provide your final answer directly (no tool calls)

  ## Guidelines
  - Never try to read the entire context at once — chunk and search strategically
  - Use llm_subquery_batch for map-reduce style analysis across many chunks
  - Record your reasoning with workspace_note so you don't lose track
  - When you have enough evidence, answer directly — do not call more tools
  """
end

Next-Step Prompt

Injected before each LLM call with current workspace state.

def next_step_prompt(%{query: query, iteration: iteration, workspace_summary: summary}) do
  base = case iteration do
    1 ->
      """
      You have not explored the context yet. Start by examining its structure.

      Query: "#{query}"
      """
    _ ->
      """
      Continue exploring to answer the query: "#{query}"

      ## Exploration Progress
      #{summary}

      Decide your next action: search, delegate to sub-LLM, or provide your final answer.
      """
  end

  %{role: :user, content: base}
end

5. Strategy Adapter — `Jido.AI.Strategies.RLM`

Thin adapter following the exact pattern of Strategies.ReAct and Strategies.TRM. Uses ReAct.Machine internally.

Configuration

use Jido.Agent,
  name: "my_rlm_agent",
  strategy: {
    Jido.AI.Strategies.RLM,
    model: "anthropic:claude-sonnet-4-20250514",
    recursive_model: "anthropic:claude-haiku-4-5",
    max_iterations: 15,
    context_inline_threshold: 2_000_000,     # 2 MB
    max_concurrency: 10
  }

Action Specs

@action_specs %{
  @start => %{
    schema: Zoi.object(%{
      query: Zoi.string(),
      context: Zoi.any() |> Zoi.optional(),
      context_ref: Zoi.map() |> Zoi.optional(),
      tool_context: Zoi.map() |> Zoi.optional()
    }),
    doc: "Start RLM context exploration with a query and large context",
    name: "rlm.start"
  },
  @llm_result => %{
    schema: Zoi.object(%{call_id: Zoi.string(), result: Zoi.any()}),
    doc: "Handle LLM response",
    name: "rlm.llm_result"
  },
  @tool_result => %{
    schema: Zoi.object(%{call_id: Zoi.string(), tool_name: Zoi.string(), result: Zoi.any()}),
    doc: "Handle tool execution result",
    name: "rlm.tool_result"
  },
  @llm_partial => %{
    schema: Zoi.object(%{call_id: Zoi.string(), delta: Zoi.string(), chunk_type: Zoi.atom() |> Zoi.default(:content)}),
    doc: "Handle streaming LLM token",
    name: "rlm.llm_partial"
  }
}

Signal Routes

def signal_routes(_ctx) do
  [
    {"rlm.explore", {:strategy_cmd, @start}},
    {"react.llm.response", {:strategy_cmd, @llm_result}},
    {"react.tool.result", {:strategy_cmd, @tool_result}},
    {"react.llm.delta", {:strategy_cmd, @llm_partial}},
    {"react.usage", Jido.Actions.Control.Noop}
  ]
end

This is the same pattern TRM uses — routing react.llm.response to its own action atoms.

Start Flow

On :rlm_start:

defp process_start(agent, %{query: query} = params) do
  config = get_config(agent)

  # 1. Store context, get reference
  context_ref = store_context(params, config)

  # 2. Initialize workspace
  workspace_ref = WorkspaceStore.init(request_id, %{query: query, context_ref: context_ref})

  # 3. Set run_tool_context (ephemeral, per-request)
  tool_context = Map.merge(params[:tool_context] || %{}, %{
    context_ref: context_ref,
    workspace_ref: workspace_ref,
    recursive_model: config.recursive_model
  })
  agent = set_run_tool_context(agent, tool_context)

  # 4. Send to ReAct machine with RLM system prompt
  msg = {:start, query, call_id}
  env = %{system_prompt: Prompts.system_prompt(config), max_iterations: config.max_iterations}
  {machine, directives} = Machine.update(machine, msg, env)

  # 5. Lift directives, injecting next_step_prompt
  {agent, lift_directives(directives, config, state)}
end

Directive Lifting (Key Difference from ReAct)

When lifting {:call_llm_stream, id, conversation}, inject the workspace-aware next-step prompt:

defp lift_directives(directives, config, state) do
  Enum.flat_map(directives, fn
    {:call_llm_stream, id, conversation} ->
      # Inject RLM next-step context
      workspace_summary = WorkspaceStore.summary(state.workspace_ref)
      iteration = state[:iteration] || 1

      next_step = Prompts.next_step_prompt(%{
        query: state[:query],
        iteration: iteration,
        workspace_summary: workspace_summary
      })

      augmented = conversation ++ [next_step]

      [Directive.LLMStream.new!(%{
        id: id,
        model: config.model,
        context: convert_to_reqllm_context(augmented),
        tools: config.reqllm_tools
      })]

    {:exec_tool, id, tool_name, arguments} ->
      # Same as ReAct — lookup Action, build ToolExec directive
      # ...

    {:request_error, call_id, reason, message} ->
      [Directive.EmitRequestError.new!(%{call_id: call_id, reason: reason, message: message})]
  end)
end

Cleanup

On terminal states (:completed or :error), clean up ETS:

# In process_instruction, after Machine.update:
new_state = if machine_state[:status] in [:completed, :error] do
  ContextStore.delete(state[:context_ref])
  WorkspaceStore.delete(state[:workspace_ref])
  Map.delete(machine_state, :run_tool_context)
else
  machine_state
end

6. Agent Macro — `Jido.AI.RLMAgent`

Following ReActAgent and TRMAgent conventions exactly.

defmodule Jido.AI.RLMAgent do
  defmacro __using__(opts) do
    # Same structure as ReActAgent:
    # - Extract name, tools (auto-include RLM exploration tools), model, etc.
    # - Build schema with request tracking fields
    # - Wire Jido.AI.Strategies.RLM as strategy
    # - Generate explore/3, await/2, explore_sync/3
    # - Generate on_before_cmd/on_after_cmd for request tracking
  end
end

Usage

defmodule MyApp.NeedleHaystackAgent do
  use Jido.AI.RLMAgent,
    name: "needle_haystack",
    description: "Finds information in massive text contexts",
    model: "anthropic:claude-sonnet-4-20250514",
    recursive_model: "anthropic:claude-haiku-4-5",
    max_iterations: 15,
    extra_tools: []  # optional additional domain-specific tools
end

# Usage
{:ok, pid} = Jido.start_agent(Jido.default_instance(), MyApp.NeedleHaystackAgent)

{:ok, result} = MyApp.NeedleHaystackAgent.explore_sync(pid,
  "Find the magic number hidden in this text",
  context: massive_text_binary,
  timeout: 300_000
)

Generated Functions

Function	Description
`explore(pid, query, opts)`	Async — returns `{:ok, %Request.Handle{}}`
`await(request, opts)`	Await specific request result
`explore_sync(pid, query, opts)`	Sync convenience wrapper
`cancel(pid, opts)`	Cancel in-flight request

Options for explore/3:

context: — binary, iodata, or %{path: "..."} for file-backed
context_ref: — pre-stored context reference (advanced)
tool_context: — additional per-request context merged with base
timeout: — request timeout

7. Data Flow — Complete Iteration

User: explore("Find the magic number", context: <100K lines>)
                    │
                    ▼
┌─────────────────────────────────────────────────────────┐
│  Strategy.RLM — :rlm_start                              │
│  1. ContextStore.put(context) → context_ref              │
│  2. WorkspaceStore.init(request_id) → workspace_ref      │
│  3. Set run_tool_context = {context_ref, workspace_ref}  │
│  4. Machine.update({:start, query, call_id}, env)        │
│  5. Emit LLMStream + next_step_prompt(iteration: 1)      │
└─────────────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────────┐
│  Root LLM (iteration 1)                                  │
│  → tool_call: context_stats({})                          │
│  → tool_call: context_chunk({strategy: "lines",          │
│                               size: 1000})               │
│                                                          │
│  Machine: awaiting_llm → awaiting_tool                   │
│  Directives: [ToolExec(context_stats), ToolExec(chunk)]  │
└─────────────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────────┐
│  Tool Results                                            │
│  context_stats: {size: 4.8MB, lines: ~100K}              │
│  context_chunk: {chunks: 100, index stored in workspace} │
│                                                          │
│  Machine: awaiting_tool → awaiting_llm                   │
│  → Append tool results to Thread                         │
│  → Emit LLMStream + next_step_prompt(iteration: 2,       │
│       workspace: "100 chunks indexed, 0 hits")            │
└─────────────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────────┐
│  Root LLM (iteration 2)                                  │
│  → tool_call: context_search({query: "magic number"})    │
│                                                          │
│  Result: {hits: [{chunk_id: "c_47", snippet: "The magic  │
│           number is 1298418", offset: 47231}]}            │
│                                                          │
│  → Append to Thread + workspace                          │
│  → Emit LLMStream + next_step_prompt(iteration: 3,       │
│       workspace: "1 hit found: 'magic number' in c_47")   │
└─────────────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────────┐
│  Root LLM (iteration 3)                                  │
│  → Final answer: "The magic number is 1298418"           │
│    (no tool calls — standard ReAct final answer)         │
│                                                          │
│  Machine: awaiting_llm → completed                       │
│  Strategy: cleanup context_ref + workspace_ref            │
└─────────────────────────────────────────────────────────┘

Thread History (What the LLM Sees)

#	Role	Content
1	system	RLM exploration methodology + tool descriptions
2	user	"You haven't explored yet..." + query
3	assistant	(tool_calls: context_stats, context_chunk)
4	tool	context_stats result: {size: 4.8MB, lines: ~100K}
5	tool	context_chunk result: {chunks: 100, ...}
6	user	"Continue... Workspace: 100 chunks, 0 hits" + query
7	assistant	(tool_call: context_search)
8	tool	context_search result: {hits: [{chunk_id: c_47, ...}]}
9	user	"Continue... Workspace: 1 hit found" + query
10	assistant	"The magic number is 1298418"

8. Where the BEAM Adds Value

Concurrent Sub-LLM Calls

The Python version runs llm_query() calls sequentially inside exec(). With SubqueryBatch:

# Root LLM calls one tool:
tool_call: llm_subquery_batch({
  chunk_ids: ["c_0", "c_1", ..., "c_99"],
  prompt: "Does this chunk contain a magic number? If so, what is it?",
  max_concurrency: 10
})

# Action fans out 100 concurrent sub-LLM calls via Task.async_stream
# Returns aggregated results in ~1/10th the time of sequential

Process Isolation

Each RLM session runs in its own agent process. A failed search action or timed-out sub-LLM call doesn't affect other sessions. The supervisor restarts cleanly.

Telemetry

Every operation is a Jido Action with built-in telemetry:

[:jido, :ai, :tool, :execute, :start]    # context_search started
[:jido, :ai, :tool, :execute, :stop]     # context_search completed (with duration)
[:jido, :ai, :react, :start]             # RLM exploration started
[:jido, :ai, :react, :iteration]         # iteration N completed
[:jido, :ai, :react, :complete]          # exploration finished

Type Safety

Every tool call is schema-validated via Zoi before execution. Invalid arguments from the LLM produce structured errors (via EmitToolError) that the LLM can self-correct from — no silent failures or runtime exceptions from malformed code.

9. Implementation Order

Step	Module	Effort	Tests
1	`Jido.AI.RLM.ContextStore`	S	Unit: put/fetch/delete, tier selection, byte range reads
2	`Jido.AI.RLM.WorkspaceStore`	S	Unit: init/get/update/summary/delete
3	`Context.Stats`	S	Unit: size estimation, encoding detection
4	`Context.Chunk`	M	Unit: line/byte chunking, overlap, max_chunks cap
5	`Context.ReadChunk`	S	Unit: chunk lookup, truncation, missing chunk error
6	`Context.Search`	M	Unit: substring/regex, limit, window, chunk_id mapping
7	`Workspace.Note` + `GetSummary`	S	Unit: append, summarize, truncation
8	`LLM.SubqueryBatch`	M	Unit: fan-out, timeout handling, result aggregation (mock ReqLLM)
9	`Jido.AI.RLM.Prompts`	S	Unit: prompt generation for various iterations/states
10	`Jido.AI.Strategies.RLM`	L	Integration: start flow, directive lifting, cleanup, multi-iteration
11	`Jido.AI.RLMAgent` macro	M	Integration: explore/await/explore_sync, request tracking
12	Example agent + demo script	S	Manual: needle-in-haystack with generated context

Steps 1–9 are independently testable with no LLM calls. Step 10 is where integration happens.

10. Open Design Decisions

Decided (for v1)

Machine: Reuse ReAct.Machine — no fork
Context storage: ETS for medium, inline for small
Termination: Standard ReAct final-answer (no FINAL markers)
Sub-LLM: Single SubqueryBatch Action with Task.async_stream
Signal namespace: rlm.explore for input, reuse react.* for internal signals

Deferred (to v2)

File-backed context: For GB-scale data. Requires streaming IO in chunk/search actions.
Depth > 1 recursion: Sub-LLM spawns its own RLM agent. Trivial on BEAM (spawn child process), but needs prompt engineering for recursive delegation.
Multi-turn with persistent workspace: Currently workspace is per-request. Supporting explore → follow-up explore with shared workspace requires moving workspace to agent state.
Custom indexing: Token-aware chunking, BM25 search, vector embeddings. Each is a new Action — the architecture supports them without changes.
Streaming search results: Stream search hits back to the LLM as they're found, rather than waiting for all results. Would need a new machine state or a streaming tool result pattern.

mikehostetler/RLM_BEAM_PLAN.md