Skip to content

Instantly share code, notes, and snippets.

@darinkishore
Created February 8, 2026 22:10
Show Gist options
  • Select an option

  • Save darinkishore/48b705bb6581bd2c1c83f72dff872a81 to your computer and use it in GitHub Desktop.

Select an option

Save darinkishore/48b705bb6581bd2c1c83f72dff872a81 to your computer and use it in GitHub Desktop.
DSPy Module System: Complete Architecture Reference (for Rust rewrite)

DSPy Module System: Complete Architecture Reference

Written for the oxide Rust rewrite. Self-contained -- no DSPy source access required.

What DSPy Is (In One Paragraph)

DSPy is a framework for programming with language models where you declare what you want (via typed signatures), not how to prompt. The framework handles prompt construction, output parsing, and -- critically -- automatic optimization of prompts and few-shot examples. The module system is the backbone that makes all of this possible.

The Core Insight

Everything in DSPy is built on a single primitive: Predict. A Predict takes a typed signature (input fields -> output fields), formats it into a prompt via an adapter, calls an LM, and parses the response back into typed outputs. Every higher-level module (ChainOfThought, ReAct, ProgramOfThought) is just orchestration on top of one or more Predict instances.

Optimizers work by discovering all Predict instances in a module tree, then modifying their demos (few-shot examples) and signature instructions (the task description). This is the entire optimization surface.

Architecture Diagram

User Program (a Module subclass)
  |
  |-- Module.__call__()
  |     |-- callbacks, usage tracking, caller stack
  |     |-- self.forward(**kwargs)
  |
  |-- Contains Predict instances (the leaf parameters)
  |     |-- Each Predict has:
  |     |     signature  (Signature class -- typed I/O contract)
  |     |     demos      (list[Example] -- few-shot examples)
  |     |     lm         (optional per-predictor LM override)
  |     |     config     (LM kwargs: temperature, n, etc.)
  |     |
  |     |-- Predict.forward():
  |     |     1. _forward_preprocess: resolve LM, merge config, get demos
  |     |     2. adapter(lm, signature, demos, inputs)
  |     |     3. _forward_postprocess: build Prediction, append to trace
  |     |
  |     |-- Adapter pipeline:
  |           format(signature, demos, inputs) -> messages
  |           lm(messages, **kwargs) -> completions
  |           parse(signature, completion) -> dict of output fields
  |
  |-- named_parameters() walks the tree, finds all Predict instances
  |-- Optimizers modify demos/instructions on discovered Predicts
  |-- save()/load() serializes the optimized state

Document Index

Document What It Covers
01_module_system.md BaseModule, Module, Parameter -- the tree structure, traversal, serialization, copy mechanics, the _compiled freeze flag
02_signatures.md Signature, SignatureMeta, InputField/OutputField -- DSPy's type system, string parsing, Pydantic integration, manipulation methods
03_predict.md Predict -- the foundation primitive, forward pipeline, preprocessing, tracing, state management
04_augmentation_patterns.md How ChainOfThought, ReAct, ProgramOfThought, MultiChainComparison, BestOfN, Refine build on Predict
05_adapters.md Adapter base class, ChatAdapter, JSONAdapter -- how signatures become prompts and responses become Predictions
06_optimizers.md How optimizers discover modules, what they modify, BootstrapFewShot, MIPRO, COPRO, BootstrapFinetune, the compile() contract, tracing
07_rust_implications.md What all of this means for a Rust implementation -- trait design, type-state patterns, the hard problems

Key Terminology

Term Meaning
Module A composable unit of computation. Has __call__ -> forward(). Can contain other Modules.
Parameter Marker trait. Only Predict implements it. Makes a module discoverable by optimizers.
Predict The leaf parameter. Holds a signature, demos, and LM config. Calls adapter -> LM -> parse.
Signature A typed contract: named input fields -> named output fields, with instructions. Implemented as a Pydantic BaseModel class (not instance).
Adapter Converts (signature, demos, inputs) -> LM messages and parses responses back. ChatAdapter uses [[ ## field ## ]] delimiters.
Demo A few-shot example (an Example dict with input+output field values). Stored on Predict.demos.
Trace A list of (predictor, inputs, prediction) tuples recorded during execution. Used by optimizers to attribute outputs to predictors.
Compiled module._compiled = True means optimizers won't recurse into it. Freezes the optimized state.
Teleprompter DSPy's name for an optimizer. compile(student, trainset) returns an optimized copy.
Example Dict-like data container with .inputs() / .labels() separation. Training data and demos are Examples.
Prediction Subclass of Example returned by all modules. Carries completions and LM usage info.

The Module System: BaseModule, Module, Parameter

Three Layers

The module system has three layers, each adding capabilities:

  1. Parameter (dspy/predict/parameter.py) -- Empty marker class. Makes things discoverable by optimizers.
  2. BaseModule (dspy/primitives/base_module.py) -- Tree traversal, serialization, copy mechanics.
  3. Module (dspy/primitives/module.py) -- The __call__ -> forward() protocol, callbacks, metaclass magic.

Predict inherits from both Module and Parameter, making it both callable and optimizable.


1. Parameter: The Marker

# dspy/predict/parameter.py
class Parameter:
    pass

That's the entire class. No methods, no state. It exists so isinstance(obj, Parameter) can distinguish "things optimizers can tune" from "things that are just structural." In the current codebase, Predict is the only class that inherits from Parameter.

Why this matters: When BaseModule.named_parameters() walks the object graph, it collects everything that passes isinstance(value, Parameter). Since only Predict does, optimizers only ever see Predict instances. Higher-level modules (ChainOfThought, ReAct) are invisible to optimizers -- they're just containers that hold Predict instances.


2. BaseModule: The Tree

BaseModule provides the infrastructure for treating a module hierarchy as a traversable tree.

2.1 named_parameters() -- DFS Parameter Discovery

This is the most important method in the entire module system. Every optimizer calls it.

def named_parameters(self):
    """
    DFS walk of self.__dict__. Finds all Parameter instances (i.e., Predict objects).
    Returns list of (dotted_path_string, Parameter_instance) tuples.

    Rules:
    - If self is a Parameter, includes ("self", self)
    - Parameter instances in __dict__ -> added directly
    - Module instances in __dict__ -> recurse (unless _compiled=True)
    - Lists/tuples -> iterate with indexed names: "name[0]", "name[1]"
    - Dicts -> iterate with keyed names: "name['key']"
    - Tracks visited set by id() to handle diamond DAGs (same object reachable via multiple paths)
    """
    import dspy
    from dspy.predict.parameter import Parameter

    visited = set()
    named_parameters = []

    def add_parameter(param_name, param_value):
        if isinstance(param_value, Parameter):
            if id(param_value) not in visited:
                visited.add(id(param_value))
                named_parameters.append((param_name, param_value))
        elif isinstance(param_value, dspy.Module):
            # CRITICAL: _compiled modules are FROZEN -- we don't recurse into them.
            # This is how pre-optimized sub-modules keep their state.
            if not getattr(param_value, "_compiled", False):
                for sub_name, param in param_value.named_parameters():
                    add_parameter(f"{param_name}.{sub_name}", param)

    if isinstance(self, Parameter):
        add_parameter("self", self)

    for name, value in self.__dict__.items():
        if isinstance(value, Parameter):
            add_parameter(name, value)
        elif isinstance(value, dspy.Module):
            if not getattr(value, "_compiled", False):
                for sub_name, param in value.named_parameters():
                    add_parameter(f"{name}.{sub_name}", param)
        elif isinstance(value, (list, tuple)):
            for idx, item in enumerate(value):
                add_parameter(f"{name}[{idx}]", item)
        elif isinstance(value, dict):
            for key, item in value.items():
                add_parameter(f"{name}['{key}']", item)

    return named_parameters

Example: Given a module MyProgram with:

class MyProgram(dspy.Module):
    def __init__(self):
        self.cot = dspy.ChainOfThought("question -> answer")
        self.summarize = dspy.Predict("text -> summary")

named_parameters() returns:

[
    ("cot.predict", <Predict instance>),   # ChainOfThought holds self.predict
    ("summarize",   <Predict instance>),   # Predict IS a Parameter
]

The dotted path names are how optimizers map traces back to specific predictors and how save()/load() serialize state.

2.2 named_sub_modules() -- BFS Module Discovery

def named_sub_modules(self, type_=None, skip_compiled=False):
    """
    BFS traversal of ALL BaseModule instances in the tree.
    Different from named_parameters:
    - BFS not DFS
    - Returns ALL modules, not just Parameters
    - Optional type filter and compiled-skip flag
    """
    if type_ is None:
        type_ = BaseModule

    queue = deque([("self", self)])
    seen = {id(self)}

    def add_to_queue(name, item):
        if id(item) not in seen:
            seen.add(id(item))
            queue.append((name, item))

    while queue:
        name, item = queue.popleft()
        if isinstance(item, type_):
            yield name, item
        if isinstance(item, BaseModule):
            if skip_compiled and getattr(item, "_compiled", False):
                continue
            for sub_name, sub_item in item.__dict__.items():
                add_to_queue(f"{name}.{sub_name}", sub_item)
        elif isinstance(item, (list, tuple)):
            for i, sub_item in enumerate(item):
                add_to_queue(f"{name}[{i}]", sub_item)
        elif isinstance(item, dict):
            for key, sub_item in item.items():
                add_to_queue(f"{name}[{key}]", sub_item)

2.3 deepcopy() -- Safe Deep Copying

def deepcopy(self):
    """
    Strategy:
    1. Try copy.deepcopy(self) -- works if all attributes are picklable
    2. If that fails, manual fallback:
       - Create empty instance via __new__ (no __init__)
       - For each attr in __dict__:
         - BaseModule -> recursive deepcopy()
         - Other -> try deepcopy, fallback copy.copy, fallback reference
    """
    try:
        return copy.deepcopy(self)
    except Exception:
        pass

    new_instance = self.__class__.__new__(self.__class__)
    for attr, value in self.__dict__.items():
        if isinstance(value, BaseModule):
            setattr(new_instance, attr, value.deepcopy())
        else:
            try:
                setattr(new_instance, attr, copy.deepcopy(value))
            except Exception:
                try:
                    setattr(new_instance, attr, copy.copy(value))
                except Exception:
                    setattr(new_instance, attr, value)
    return new_instance

Why the fallback matters: Some modules hold references to non-picklable objects (LM connections, thread pools). The manual fallback ensures the module tree is still copyable even when copy.deepcopy chokes.

2.4 reset_copy() -- Fresh Copy for Optimization

def reset_copy(self):
    """Deep copy, then reset() every parameter.
    Creates a fresh copy with architecture intact but all learned state cleared.
    Used by optimizers to create candidate programs."""
    new_instance = self.deepcopy()
    for param in new_instance.parameters():
        param.reset()
    return new_instance

param.reset() on a Predict clears self.lm, self.traces, self.train, and self.demos. The architecture (signature, config) is preserved; the learned state is wiped.

2.5 dump_state() / load_state() -- Serialization

def dump_state(self, json_mode=True):
    """Serializes every parameter: {dotted_path: param.dump_state()}"""
    return {name: param.dump_state(json_mode=json_mode)
            for name, param in self.named_parameters()}

def load_state(self, state):
    """Deserializes: walks named_parameters(), calls each param.load_state()"""
    for name, param in self.named_parameters():
        param.load_state(state[name])

For a Predict, dump_state() serializes:

  • traces (execution traces)
  • train (training examples)
  • demos (few-shot examples, serialized via serialize_object for JSON safety)
  • signature state (instructions + field prefixes/descriptions)
  • lm state (model config) or None

2.6 save() / load() -- File I/O

Two modes:

State-only (default): Saves just the optimized state (demos, instructions, etc.) to .json or .pkl.

def save(self, path, save_program=False):
    # state = self.dump_state() + metadata (python/dspy/cloudpickle versions)
    # Write to JSON or pickle based on file extension

Full program (save_program=True): Uses cloudpickle to serialize the entire module object (architecture + state) to a directory containing program.pkl + metadata.json.

load() reads state and calls self.load_state(state). Note: this loads state into an existing module. For loading a whole program from pickle, there's a separate dspy.load() function.


3. Module: The Call Protocol

Module extends BaseModule with the call/forward protocol, a metaclass that ensures safe initialization, and convenience methods.

3.1 ProgramMeta -- The Metaclass

class ProgramMeta(type):
    """Ensures _base_init runs BEFORE __init__, even if subclass forgets super().__init__().

    When you do MyModule(args):
    1. __new__ creates the instance (no __init__ yet)
    2. Module._base_init(obj) -- sets _compiled, callbacks, history
    3. cls.__init__(obj, args) -- the user's actual __init__
    4. Safety: ensures callbacks and history exist even if __init__ didn't set them
    """
    def __call__(cls, *args, **kwargs):
        obj = cls.__new__(cls, *args, **kwargs)
        if isinstance(obj, cls):
            Module._base_init(obj)
            cls.__init__(obj, *args, **kwargs)
            if not hasattr(obj, "callbacks"):
                obj.callbacks = []
            if not hasattr(obj, "history"):
                obj.history = []
        return obj

Why this exists: If a user writes class MyModule(dspy.Module) and forgets super().__init__(), the module would lack _compiled, callbacks, and history. The metaclass guarantees these always exist.

3.2 Module Attributes

class Module(BaseModule, metaclass=ProgramMeta):
    def _base_init(self):
        self._compiled = False    # Has this module been optimized?
        self.callbacks = []       # List of BaseCallback instances
        self.history = []         # LM call history

    def __init__(self, callbacks=None):
        self.callbacks = callbacks or []
        self._compiled = False
        self.history = []

3.3 __call__() -- The Central Dispatch

@with_callbacks  # Wraps with on_module_start / on_module_end callbacks
def __call__(self, *args, **kwargs):
    """
    1. Get caller_modules stack from settings (tracks nested module calls)
    2. Append self to the stack
    3. In a settings.context with updated caller_modules:
       a. If usage tracking enabled and no tracker yet, create one
       b. Call self.forward(*args, **kwargs)
       c. If tracking, attach token usage to the Prediction
    4. Return the Prediction
    """
    caller_modules = settings.caller_modules or []
    caller_modules = list(caller_modules)
    caller_modules.append(self)

    with settings.context(caller_modules=caller_modules):
        if settings.track_usage and no_tracker_yet:
            with track_usage() as usage_tracker:
                output = self.forward(*args, **kwargs)
            tokens = usage_tracker.get_total_tokens()
            self._set_lm_usage(tokens, output)
            return output
        return self.forward(*args, **kwargs)

__call__ vs forward(): __call__ is the public entry point. It handles callbacks, usage tracking, and the module call stack. forward() is the actual logic that subclasses override. There is a __getattribute__ override that warns if you call .forward() directly (it inspects the call stack):

def __getattribute__(self, name):
    attr = super().__getattribute__(name)
    if name == "forward" and callable(attr):
        stack = inspect.stack()
        forward_called_directly = len(stack) <= 1 or stack[1].function != "__call__"
        if forward_called_directly:
            logger.warning("Calling module.forward() directly is discouraged. Use module() instead.")
    return attr

3.4 Pickle Support

def __getstate__(self):
    """Excludes history and callbacks (transient state) from pickle"""
    state = self.__dict__.copy()
    state.pop("history", None)
    state.pop("callbacks", None)
    return state

def __setstate__(self, state):
    """Restores history and callbacks as empty on unpickle"""
    self.__dict__.update(state)
    if not hasattr(self, "history"):
        self.history = []
    if not hasattr(self, "callbacks"):
        self.callbacks = []

3.5 Convenience Methods

def named_predictors(self):
    """Filters named_parameters() to only Predict instances"""
    from dspy.predict.predict import Predict
    return [(name, param) for name, param in self.named_parameters()
            if isinstance(param, Predict)]

def predictors(self):
    """Just the Predict objects, no names"""
    return [param for _, param in self.named_predictors()]

def set_lm(self, lm):
    """Sets the LM on ALL predictors in the tree"""
    for _, param in self.named_predictors():
        param.lm = lm

def get_lm(self):
    """Returns the LM if all predictors share one, raises if they differ"""

def map_named_predictors(self, func):
    """Applies func to each predictor and replaces it in the tree.
    Uses magicattr.set for nested path assignment (handles dotted paths)."""
    for name, predictor in self.named_predictors():
        set_attribute_by_name(self, name, func(predictor))
    return self

4. The _compiled Flag

_compiled is a boolean that controls optimizer traversal:

  1. Initialized to False on every new Module (via _base_init)
  2. Set to True by optimizers after compilation (e.g., student._compiled = True)
  3. When True, named_parameters() stops recursing into this module -- its Predict instances are invisible to further optimization
  4. This is how you compose pre-optimized modules: a compiled sub-module's demos and signature instructions won't be overwritten by a parent optimizer

Example:

# Pre-optimize a sub-module
optimized_qa = bootstrap.compile(qa_module, trainset=data)
# optimized_qa._compiled is now True

# Use it in a larger program
class Pipeline(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Predict("query -> passages")
        self.qa = optimized_qa  # _compiled=True, frozen

# When a parent optimizer runs on Pipeline:
# named_parameters() finds: [("retrieve", <Predict>)]
# It does NOT find optimized_qa's internal Predict -- it's frozen.

5. The Full Hierarchy

BaseModule
  |-- named_parameters()        # DFS, finds Parameters (Predict instances)
  |-- named_sub_modules()       # BFS, finds all Modules
  |-- deepcopy() / reset_copy() # Safe copying
  |-- dump_state() / load_state() / save() / load()  # Serialization
  |
  +-- Module (metaclass=ProgramMeta)
        |-- __call__() -> forward()   # The call protocol
        |-- callbacks, history        # Transient state
        |-- _compiled                 # Freeze flag
        |-- named_predictors()        # Convenience filter
        |-- set_lm() / get_lm()      # LM management
        |
        +-- Predict (also inherits Parameter)
              |-- signature, demos, lm, config  # Optimizable state
              |-- forward() -> adapter -> LM -> parse -> Prediction
              |-- traces, train                 # Optimization bookkeeping
              |-- reset()                       # Clear learned state

The dual inheritance of Predict is the key design decision: It is both a Module (callable, composable, has forward()) and a Parameter (discoverable by optimizers). Everything else in the system follows from this.

Signatures: DSPy's Type System

What a Signature Is

A Signature is a typed contract between a module and an LM: named input fields -> named output fields, with instructions. It's the thing that makes DSPy declarative -- you say "question -> answer" and the framework handles prompt construction, output parsing, and type validation.

Critical implementation detail: A Signature is a class, not an instance. When you write dspy.Signature("question -> answer"), you get back a new type (a dynamically-created Pydantic BaseModel subclass), not an object. Operations like prepend, with_instructions, delete all return new classes. This is metaclass-heavy Python.


1. File Layout

dspy/signatures/
  signature.py   -- Signature class, SignatureMeta metaclass, make_signature(), parsing
  field.py       -- InputField(), OutputField() factory functions
  utils.py       -- get_dspy_field_type() helper

2. InputField and OutputField

These are factory functions (not classes) that return pydantic.Field() instances with DSPy metadata stuffed into json_schema_extra:

# dspy/signatures/field.py

def InputField(**kwargs):
    return pydantic.Field(**move_kwargs(**kwargs, __dspy_field_type="input"))

def OutputField(**kwargs):
    return pydantic.Field(**move_kwargs(**kwargs, __dspy_field_type="output"))

move_kwargs separates DSPy-specific arguments from Pydantic-native arguments:

DSPy-specific (stored in json_schema_extra):

Argument Type Purpose
__dspy_field_type "input" or "output" The discriminator -- how the system tells inputs from outputs
desc str Field description shown to the LM in the prompt
prefix str Prompt prefix for this field (e.g., "Question:")
format callable Optional formatting function
parser callable Optional parsing function
constraints str Human-readable constraint strings

Pydantic-native (passed through to pydantic.Field):

Argument Purpose
gt, ge, lt, le Numeric constraints
min_length, max_length String/collection length
default Default value

Constraint translation: Pydantic constraints are automatically converted to human-readable strings. OutputField(ge=5, le=10) generates constraints="greater than or equal to: 5, less than or equal to: 10" which gets included in the prompt so the LM knows the bounds.


3. SignatureMeta: The Metaclass

SignatureMeta extends type(BaseModel) (Pydantic's metaclass). It does three key things:

3.1 __call__ -- String Shorthand Interception

class SignatureMeta(type(BaseModel)):
    def __call__(cls, *args, **kwargs):
        # If called with a string like Signature("question -> answer"),
        # route to make_signature() to create a new class (not instance)
        if cls is Signature:
            if len(args) == 1 and isinstance(args[0], (str, dict)):
                return make_signature(args[0], kwargs.pop("instructions", None))
        # Otherwise, create an actual instance (rare in normal DSPy usage)
        return super().__call__(*args, **kwargs)

This means dspy.Signature("question -> answer") returns a new class, not an instance.

3.2 __new__ -- Class Creation

When a Signature class is being defined (either via class QA(dspy.Signature) or via make_signature()):

def __new__(mcs, signature_name, bases, namespace):
    # 1. Set str as default type for fields without annotations
    for name in namespace:
        if name not in annotations:
            annotations[name] = str

    # 2. Preserve field ordering: inputs before outputs
    # (reorder annotations dict to match declaration order)

    # 3. Let Pydantic create the class
    cls = super().__new__(mcs, signature_name, bases, namespace)

    # 4. Set default instructions if none given
    if not cls.__doc__:
        inputs = ", ".join(f"`{k}`" for k in cls.input_fields)
        outputs = ", ".join(f"`{k}`" for k in cls.output_fields)
        cls.__doc__ = f"Given the fields {inputs}, produce the fields {outputs}."

    # 5. Validate: every field must have InputField or OutputField
    for name, field in cls.model_fields.items():
        if "__dspy_field_type" not in (field.json_schema_extra or {}):
            raise TypeError(f"Field '{name}' must use InputField or OutputField")

    # 6. Auto-generate prefix and desc for fields that don't have them
    for name, field in cls.model_fields.items():
        extra = field.json_schema_extra
        if "prefix" not in extra:
            extra["prefix"] = infer_prefix(name)  # snake_case -> "Title Case:"
        if "desc" not in extra:
            extra["desc"] = f"${{{name}}}"  # template placeholder

3.3 infer_prefix() -- Name to Prompt Prefix

Converts field names to human-readable prefixes:

  • "question" -> "Question:"
  • "some_attribute_name" -> "Some Attribute Name:"
  • "HTMLParser" -> "HTML Parser:"

Uses regex to split on underscores and camelCase boundaries, then title-cases and joins.


4. Two Ways to Define Signatures

Class-Based (Full Control)

class QA(dspy.Signature):
    """Answer questions with short factoid answers."""

    question: str = dspy.InputField()
    answer: str = dspy.OutputField(desc="often between 1 and 5 words")

Here QA is a class. QA.__doc__ becomes the instructions. Fields are declared as class attributes with type annotations and InputField/OutputField defaults.

String Shorthand (Quick)

sig = dspy.Signature("question -> answer")
sig = dspy.Signature("question: str, context: list[str] -> answer: str")
sig = dspy.Signature("question -> answer", "Answer the question.")

When SignatureMeta.__call__ sees a string, it routes to make_signature().

The String Parser

The parser is clever -- it uses Python's AST module:

def _parse_field_string(field_string: str, names=None):
    # Wraps the field string as function parameters and parses with ast
    args = ast.parse(f"def f({field_string}): pass").body[0].args.args

This means field strings follow Python function parameter syntax: question: str, context: list[int] is valid because it would be valid as def f(question: str, context: list[int]): pass.

Type resolution happens in _parse_type_node(), which recursively walks the AST:

  • Simple: int, str, float, bool
  • Generic: list[int], dict[str, float], tuple[str, int]
  • Union: Union[int, str], Optional[str], PEP 604 int | str
  • Nested: dict[str, list[Optional[Tuple[int, str]]]]
  • Custom: looked up via a names dict or by walking the Python call stack

Custom type auto-detection (_detect_custom_types_from_caller): When you write Signature("input: MyType -> output"), the metaclass walks up the call stack (up to 100 frames) looking in f_locals and f_globals for MyType. This is fragile but convenient. The reliable alternative is passing custom_types={"MyType": MyType}.

make_signature() -- The Factory

def make_signature(signature, instructions=None, signature_name="StringSignature"):
    """
    Accepts either:
    - A string: "question -> answer" (parsed into fields)
    - A dict: {"question": InputField(), "answer": OutputField()} (used directly)

    Creates a new Signature class via pydantic.create_model().
    """
    if isinstance(signature, str):
        fields = _parse_signature(signature)
    else:
        fields = signature  # dict of {name: (type, FieldInfo)}

    # pydantic.create_model creates a new BaseModel subclass dynamically
    model = pydantic.create_model(
        signature_name,
        __base__=Signature,
        __doc__=instructions,
        **fields,
    )
    return model

5. Signature Properties (Class-Level)

These are properties on the metaclass, meaning they're accessed on the class itself (not instances):

@property
def instructions(cls) -> str:
    """The cleaned docstring. This is the task description shown to the LM."""
    return cls.__doc__

@property
def input_fields(cls) -> dict[str, FieldInfo]:
    """Fields where __dspy_field_type == "input", in declaration order"""
    return {k: v for k, v in cls.model_fields.items()
            if v.json_schema_extra["__dspy_field_type"] == "input"}

@property
def output_fields(cls) -> dict[str, FieldInfo]:
    """Fields where __dspy_field_type == "output", in declaration order"""
    return {k: v for k, v in cls.model_fields.items()
            if v.json_schema_extra["__dspy_field_type"] == "output"}

@property
def fields(cls) -> dict[str, FieldInfo]:
    """All fields: {**input_fields, **output_fields}"""
    return {**cls.input_fields, **cls.output_fields}

@property
def signature(cls) -> str:
    """String representation: "input1, input2 -> output1, output2" """
    inputs = ", ".join(cls.input_fields.keys())
    outputs = ", ".join(cls.output_fields.keys())
    return f"{inputs} -> {outputs}"

6. Signature Manipulation

All manipulation methods return new Signature classes. The original is never mutated. This is the immutable pattern.

with_instructions(instructions: str) -> type[Signature]

def with_instructions(cls, instructions: str):
    """New Signature with different instructions, same fields."""
    return Signature(cls.fields, instructions)

with_updated_fields(name, type_=None, **kwargs) -> type[Signature]

def with_updated_fields(cls, name, type_=None, **kwargs):
    """Deep-copies fields, updates json_schema_extra for the named field, creates new Signature."""
    fields_copy = deepcopy(cls.fields)
    fields_copy[name].json_schema_extra = {**fields_copy[name].json_schema_extra, **kwargs}
    if type_ is not None:
        fields_copy[name].annotation = type_
    return Signature(fields_copy, cls.instructions)

Used by COPRO to change field prefixes: sig.with_updated_fields("answer", prefix="Final Answer:").

prepend(name, field, type_=None) / append(name, field, type_=None)

Both delegate to insert():

def prepend(cls, name, field, type_=None):
    return cls.insert(0, name, field, type_)

def append(cls, name, field, type_=None):
    return cls.insert(-1, name, field, type_)

insert(index, name, field, type_=None)

def insert(cls, index, name, field, type_=None):
    """
    Splits fields into input_fields and output_fields lists.
    Determines which list based on __dspy_field_type.
    Inserts at the given index.
    Recombines and creates a new Signature.
    """
    input_fields = list(cls.input_fields.items())
    output_fields = list(cls.output_fields.items())

    lst = input_fields if field.json_schema_extra["__dspy_field_type"] == "input" else output_fields
    lst.insert(index, (name, (type_ or str, field)))

    new_fields = dict(input_fields + output_fields)
    return Signature(new_fields, cls.instructions)

delete(name)

def delete(cls, name):
    """Removes the named field. Returns new Signature."""
    fields_copy = dict(cls.fields)
    fields_copy.pop(name, None)
    return Signature(fields_copy, cls.instructions)

7. How Modules Modify Signatures

This is the core of the "augmentation pattern." Each module type manipulates the signature differently:

ChainOfThought -- Prepend Reasoning

extended_signature = signature.prepend(
    name="reasoning",
    field=dspy.OutputField(
        prefix="Reasoning: Let's think step by step in order to",
        desc="${reasoning}"
    ),
    type_=str
)

"question -> answer" becomes "question -> reasoning, answer". The LM is forced to produce reasoning before the answer.

ReAct -- Build From Scratch

react_signature = (
    dspy.Signature({**signature.input_fields}, "\n".join(instr))
    .append("trajectory", dspy.InputField(), type_=str)
    .append("next_thought", dspy.OutputField(), type_=str)
    .append("next_tool_name", dspy.OutputField(), type_=Literal[tuple(tools.keys())])
    .append("next_tool_args", dspy.OutputField(), type_=dict[str, Any])
)

Note Literal[tuple(tools.keys())] -- the type system constrains what the LM can output for tool selection.

MultiChainComparison -- Append Input Fields + Prepend Output

for idx in range(M):
    signature = signature.append(
        f"reasoning_attempt_{idx+1}",
        InputField(prefix=f"Student Attempt #{idx+1}:")
    )
signature = signature.prepend("rationale", OutputField(prefix="Accurate Reasoning: ..."))

Refine -- Dynamic Injection at Call Time

signature = signature.append("hint_", InputField(desc="A hint from an earlier run"))

Done inside the adapter wrapper at call time, not at construction time. This is unique -- most modules modify signatures at __init__.


8. Signature Serialization

dump_state() / load_state(state)

def dump_state(cls):
    """Dumps instructions + per-field prefix and description."""
    return {
        "instructions": cls.instructions,
        "fields": {
            name: {
                "prefix": field.json_schema_extra.get("prefix"),
                "desc": field.json_schema_extra.get("desc"),
            }
            for name, field in cls.fields.items()
        }
    }

def load_state(cls, state):
    """Creates a new Signature from stored state.
    Updates instructions and field prefix/desc from the saved state."""
    new_sig = cls.with_instructions(state["instructions"])
    for name, field_state in state.get("fields", {}).items():
        if name in new_sig.fields:
            new_sig = new_sig.with_updated_fields(name, **field_state)
    return new_sig

This is what Predict.dump_state() calls under state["signature"]. It preserves the optimized instructions and field metadata while the field types and structure come from the code.


9. Pydantic Integration

How Types Map to Prompts

The adapter uses translate_field_type() to generate type hints for the LM:

Python Type Prompt Hint
str (no hint)
bool "must be True or False"
int / float "must be a single int/float value"
Enum "must be one of: val1; val2; val3"
Literal["a", "b"] "must exactly match one of: a; b"
Complex types "must adhere to the JSON schema: {...}" (Pydantic JSON schema)

How Parsing Works

Parsing happens in parse_value() (dspy/adapters/utils.py):

  1. str annotation -> return raw string
  2. Enum -> find matching member by value or name
  3. Literal -> validate against allowed values
  4. bool/int/float -> type cast
  5. Complex types -> json_repair.loads() then pydantic.TypeAdapter(annotation).validate_python()
  6. DSPy Type subclasses -> custom parsing

10. The Signature as Contract

A Signature encodes:

Aspect How
What inputs are needed input_fields dict
What outputs are produced output_fields dict
How to describe the task instructions (docstring)
How to present each field prefix and desc per field
What types are expected Python type annotations per field
What constraints apply Pydantic constraints -> constraints string
Field ordering Dict insertion order (inputs first, then outputs)

The signature flows through the entire system:

  • Module holds it on self.signature
  • Adapter.format() reads it to build the prompt
  • Adapter.parse() reads it to know what to extract
  • Optimizers modify instructions and field prefix/desc
  • save()/load() serializes/deserializes it

Predict: The Foundation Primitive

What Predict Is

Predict is the only leaf node in the DSPy module tree. It is the only class that inherits from both Module (callable, composable) and Parameter (discoverable by optimizers). Every higher-level module (ChainOfThought, ReAct, etc.) ultimately delegates to one or more Predict instances.

A Predict takes a Signature, formats it into a prompt via an adapter, calls an LM, parses the response back into typed outputs, and returns a Prediction.


1. Construction

class Predict(Module, Parameter):
    def __init__(self, signature: str | type[Signature], callbacks=None, **config):
        super().__init__(callbacks=callbacks)
        self.stage = random.randbytes(8).hex()  # Unique ID for tracing
        self.signature = ensure_signature(signature)  # Parse string -> Signature class
        self.config = config  # Default LM kwargs (temperature, n, etc.)
        self.reset()

    def reset(self):
        """Clears all learned/optimizable state."""
        self.lm = None      # Per-predictor LM override (None = use settings.lm)
        self.traces = []     # Execution traces (for optimization)
        self.train = []      # Training examples
        self.demos = []      # Few-shot examples (THE primary optimizable state)

Key Attributes

Attribute Type Purpose Optimizable?
signature type[Signature] The typed I/O contract Yes (instructions, field prefixes)
demos list[Example] Few-shot examples prepended to prompt Yes (primary optimization lever)
lm LM | None Per-predictor LM override Yes (BootstrapFinetune replaces this)
config dict Default LM kwargs (temp, n, etc.) No (set at construction)
stage str Random hex ID for tracing No
traces list Execution traces for optimization Bookkeeping
train list Training examples Bookkeeping

ensure_signature()

Converts various inputs to a Signature class:

  • String "question -> answer" -> parse into a Signature class
  • Existing Signature class -> return as-is
  • Dict of fields -> create a Signature class

2. The Forward Pipeline

Predict.__call__(**kwargs) -> Module.__call__ (callbacks, tracking) -> Predict.forward(**kwargs).

Note: Predict.__call__ first validates that no positional args are passed (must use keyword args matching signature fields):

def __call__(self, *args, **kwargs):
    if args:
        raise ValueError(self._get_positional_args_error_message())
    return super().__call__(**kwargs)

2.1 forward() -- Three Steps

def forward(self, **kwargs):
    # Step 1: Resolve LM, merge config, extract demos
    lm, config, signature, demos, kwargs = self._forward_preprocess(**kwargs)

    # Step 2: Get adapter and run the full pipeline
    adapter = settings.adapter or ChatAdapter()

    if self._should_stream():
        with settings.context(caller_predict=self):
            completions = adapter(lm, lm_kwargs=config, signature=signature,
                                  demos=demos, inputs=kwargs)
    else:
        with settings.context(send_stream=None):
            completions = adapter(lm, lm_kwargs=config, signature=signature,
                                  demos=demos, inputs=kwargs)

    # Step 3: Build Prediction, record trace
    return self._forward_postprocess(completions, signature, **kwargs)

2.2 _forward_preprocess() -- The Critical Setup

This method extracts "privileged" kwargs that override Predict's defaults, resolves the LM, and prepares everything for the adapter call.

def _forward_preprocess(self, **kwargs):
    # 1. Extract privileged kwargs (these are NOT passed to the LM as inputs)
    signature = kwargs.pop("signature", self.signature)
    signature = ensure_signature(signature)

    demos = kwargs.pop("demos", self.demos)

    config = {**self.config, **kwargs.pop("config", {})}

    lm = kwargs.pop("lm", self.lm) or settings.lm

    # 2. Validate LM exists and is the right type
    if lm is None or not isinstance(lm, BaseLM):
        raise ValueError("No LM is loaded / invalid LM type")

    # 3. Auto-adjust temperature for multi-generation
    if config.get("n", 1) > 1 and config.get("temperature", 0) <= 0.15:
        config["temperature"] = 0.7  # Prevent deterministic multi-gen

    # 4. Handle OpenAI predicted outputs
    if "prediction" in kwargs:
        config["prediction"] = kwargs.pop("prediction")

    # 5. Fill missing input fields with Pydantic defaults
    for field_name, field_info in signature.input_fields.items():
        if field_name not in kwargs:
            if field_info.default is not PydanticUndefined:
                kwargs[field_name] = field_info.default

    # 6. Warn about missing required inputs
    for field_name in signature.input_fields:
        if field_name not in kwargs:
            logger.warning(f"Missing input: {field_name}")

    return lm, config, signature, demos, kwargs

LM resolution order: kwargs["lm"] > self.lm > settings.lm

Config merge: {**self.config, **kwargs["config"]} -- per-call config overrides construction-time config.

2.3 _forward_postprocess() -- Tracing

def _forward_postprocess(self, completions, signature, **kwargs):
    # 1. Build Prediction from completions
    pred = Prediction.from_completions(completions, signature=signature)

    # 2. Append to trace if tracing is enabled
    if kwargs.pop("_trace", True) and settings.trace is not None:
        trace = settings.trace
        if len(trace) >= settings.max_trace_size:
            trace.pop(0)  # LRU eviction
        trace.append((self, {**kwargs}, pred))
        # Tuple: (predictor_instance, input_kwargs_dict, prediction_output)

    return pred

The trace tuple (self, inputs, prediction) is how optimizers connect outputs back to specific Predict instances. BootstrapFewShot reads these traces to create demos.


3. Predict State Management

dump_state() -- Serialization

def dump_state(self, json_mode=True):
    state_keys = ["traces", "train"]
    state = {k: getattr(self, k) for k in state_keys}

    # Serialize demos (the main optimizable state)
    state["demos"] = []
    for demo in self.demos:
        demo = demo.copy()
        for field in demo:
            demo[field] = serialize_object(demo[field])  # Pydantic models -> dicts
        if isinstance(demo, dict) or not json_mode:
            state["demos"].append(demo)
        else:
            state["demos"].append(demo.toDict())

    # Signature state (instructions + field prefixes/descriptions)
    state["signature"] = self.signature.dump_state()

    # LM state (model config) or None
    state["lm"] = self.lm.dump_state() if self.lm else None

    return state

load_state() -- Deserialization

def load_state(self, state):
    excluded_keys = ["signature", "extended_signature", "lm"]
    for name, value in state.items():
        if name not in excluded_keys:
            setattr(self, name, value)  # demos, traces, train

    # Reconstruct signature from saved instructions/field metadata
    self.signature = self.signature.load_state(state["signature"])

    # Reconstruct LM from saved config
    self.lm = LM(**state["lm"]) if state["lm"] else None

What Gets Serialized

Field Serialized? Format
demos Yes List of dicts (Example.toDict())
traces Yes Raw list
train Yes Raw list
signature Yes {instructions, fields: {name: {prefix, desc}}}
lm Yes (if set) LM config dict (model name, kwargs)
config No Comes from code
stage No Random, regenerated
callbacks No Transient

4. The Adapter Call

Inside forward(), the adapter call is the heart of the computation:

adapter = settings.adapter or ChatAdapter()
completions = adapter(lm, lm_kwargs=config, signature=signature, demos=demos, inputs=kwargs)

The adapter does:

  1. _call_preprocess(): Handle native tool calls, reasoning types. May remove fields from signature.
  2. format(signature, demos, inputs): Build message list (system + demos + user).
  3. lm(messages=messages, **kwargs): Actually call the LM.
  4. _call_postprocess(): Parse each completion via parse(signature, text).

The result is a list of dicts, one per completion, each containing the output field values.

Then Prediction.from_completions() wraps this into a Prediction object.


5. Prediction and Example

Example (dspy/primitives/example.py)

Dict-like container with input/label separation:

class Example:
    def __init__(self, **kwargs):
        self._store = kwargs          # The actual data
        self._input_keys = set()      # Which keys are inputs
        self._demos = []              # Attached demos (rarely used)

    def with_inputs(self, *keys):
        """Mark which fields are inputs. Returns self (mutates)."""
        self._input_keys = set(keys)
        return self

    def inputs(self):
        """Returns Example with only input keys."""
        return {k: v for k, v in self._store.items() if k in self._input_keys}

    def labels(self):
        """Returns Example with only non-input keys."""
        return {k: v for k, v in self._store.items() if k not in self._input_keys}

Training data and demos are both Examples. The .with_inputs() call marks the boundary between what gets passed as input and what's a label.

Prediction (dspy/primitives/prediction.py)

Subclass of Example, returned by all modules:

class Prediction(Example):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self._completions = None   # All completions (not just the first)
        self._lm_usage = None      # Token usage tracking

    @classmethod
    def from_completions(cls, list_or_dict, signature=None):
        """
        Wraps completions into a Prediction.
        - Stores all completions as a Completions object
        - pred._store = {k: v[0] for k, v in completions.items()}
          (first completion is the default)
        """
        obj = cls()
        obj._completions = Completions(list_or_dict, signature=signature)
        # Set primary values to first completion
        obj._store = {k: v[0] for k, v in obj._completions.items()}
        return obj

Attribute access (pred.answer) returns the first completion's value. pred.completions.answer returns all completions for that field.


6. The Complete Flow

Putting it all together for a single predict(question="What is 2+2?") call:

1. Predict.__call__(question="What is 2+2?")
   -> Validates no positional args
   -> Module.__call__(**kwargs)
      -> @with_callbacks: on_module_start
      -> Push self to caller_modules stack
      -> Predict.forward(question="What is 2+2?")

2. _forward_preprocess(question="What is 2+2?")
   -> signature = self.signature (e.g., "question -> answer")
   -> demos = self.demos (e.g., 3 few-shot examples)
   -> config = {**self.config} (e.g., {temperature: 0})
   -> lm = self.lm or settings.lm
   -> kwargs = {question: "What is 2+2?"}
   -> return (lm, config, signature, demos, kwargs)

3. adapter = settings.adapter or ChatAdapter()

4. completions = adapter(lm, lm_kwargs=config, signature=signature,
                         demos=demos, inputs=kwargs)

   Inside adapter.__call__:
   a. _call_preprocess: check for tools/native types, may modify signature
   b. format(signature, demos, inputs):
      - System message: field descriptions + format structure + instructions
      - Demo messages: few-shot examples as user/assistant pairs
      - User message: current inputs + output format reminder
   c. lm(messages=messages, **lm_kwargs):
      - litellm call to the actual LM
      - Returns list of completion strings
   d. _call_postprocess: for each completion:
      - parse(signature, text): extract output field values
      - Returns list of dicts: [{answer: "4"}, ...]

5. _forward_postprocess(completions, signature, question="What is 2+2?")
   -> Prediction.from_completions([{answer: "4"}])
   -> Append (self, {question: "What is 2+2?"}, prediction) to settings.trace
   -> Return prediction

6. Module.__call__ returns
   -> @with_callbacks: on_module_end
   -> Return Prediction(answer="4")

Augmentation Patterns: How Modules Build on Predict

The Core Idea

Every DSPy module that does anything interesting is orchestration on top of Predict. The module itself is not a parameter -- it's a container. The actual "learning" (demos, instructions) lives entirely inside the Predict instances it holds.

There are exactly four augmentation patterns in DSPy:

Pattern Mechanism Modules
Signature Extension Modify the signature at __init__ time, delegate to one Predict ChainOfThought, MultiChainComparison
Multi-Signature Orchestration Multiple Predicts with different signatures, orchestrated in a loop ReAct, ProgramOfThought
Module Wrapping Wrap an arbitrary Module, run it multiple times, select best output BestOfN, Refine
Aggregation Take multiple completions and synthesize/vote MultiChainComparison, majority()

Pattern 1: Signature Extension

ChainOfThought -- The Canonical Example

File: dspy/predict/chain_of_thought.py

class ChainOfThought(Module):
    def __init__(self, signature, rationale_field=None, rationale_field_type=str, **config):
        super().__init__()
        signature = ensure_signature(signature)

        # Default rationale field
        prefix = "Reasoning: Let's think step by step in order to"
        desc = "${reasoning}"
        rationale_field_type = rationale_field.annotation if rationale_field else rationale_field_type
        rationale_field = rationale_field if rationale_field else dspy.OutputField(prefix=prefix, desc=desc)

        # THE AUGMENTATION: prepend a "reasoning" output field
        extended_signature = signature.prepend(
            name="reasoning",
            field=rationale_field,
            type_=rationale_field_type
        )

        # Single Predict with the extended signature
        self.predict = dspy.Predict(extended_signature, **config)

    def forward(self, **kwargs):
        return self.predict(**kwargs)

What happens:

  • "question -> answer" becomes "question -> reasoning, answer"
  • The LM is forced to produce reasoning before answer
  • forward() is a pure passthrough to the single Predict

What optimizers see: One Predict at path "predict". They can:

  • Add demos to self.predict.demos
  • Rewrite self.predict.signature.instructions
  • Rewrite the reasoning field's prefix (e.g., change "Let's think step by step" to something better)

The Reasoning type trick: If rationale_field_type is the Reasoning custom type (instead of str), the adapter detects it at call time. If the LM supports native reasoning (o1, o3), the adapter removes the reasoning field from the signature and enables the model's built-in chain-of-thought via reasoning_effort in lm_kwargs. The LM does its own reasoning internally, and the adapter extracts reasoning_content from the response. For non-reasoning models, it falls back to text-based reasoning.

MultiChainComparison -- Aggregation via Signature Extension

File: dspy/predict/multi_chain_comparison.py

class MultiChainComparison(Module):
    def __init__(self, signature, M=3, temperature=0.7, **config):
        super().__init__()
        self.M = M
        signature = ensure_signature(signature)
        *_, self.last_key = signature.output_fields.keys()  # The final output field name

        # Append M input fields for "student attempts"
        for idx in range(M):
            signature = signature.append(
                f"reasoning_attempt_{idx+1}",
                InputField(
                    prefix=f"Student Attempt #{idx+1}:",
                    desc="${reasoning attempt}"
                ),
            )

        # Prepend a rationale output field
        signature = signature.prepend(
            "rationale",
            OutputField(
                prefix="Accurate Reasoning: Thank you everyone. Let's now holistically",
                desc="${corrected reasoning}",
            ),
        )

        self.predict = Predict(signature, temperature=temperature, **config)

The forward method is unique -- it takes completions as input:

def forward(self, completions, **kwargs):
    attempts = []
    for c in completions:
        rationale = c.get("rationale", c.get("reasoning")).strip().split("\n")[0].strip()
        answer = str(c[self.last_key]).strip().split("\n")[0].strip()
        attempts.append(
            f"<<I'm trying to {rationale} I'm not sure but my prediction is {answer}>>"
        )

    kwargs = {
        **{f"reasoning_attempt_{idx+1}": attempt for idx, attempt in enumerate(attempts)},
        **kwargs,
    }
    return self.predict(**kwargs)

The pattern: run ChainOfThought M times, feed all M attempts into MultiChainComparison, get a synthesized answer. The signature extension adds the M input slots and a synthesis rationale.


Pattern 2: Multi-Signature Orchestration

ReAct -- Tool-Using Agent Loop

File: dspy/predict/react.py

class ReAct(Module):
    def __init__(self, signature, tools, max_iters=20):
        super().__init__()
        self.signature = signature = ensure_signature(signature)
        self.max_iters = max_iters

        # Convert callables to Tool objects
        tools = [t if isinstance(t, Tool) else Tool(t) for t in tools]
        tools = {tool.name: tool for tool in tools}

        # Add a "finish" tool that signals completion
        # (returns a dict with the original output field values)
        tools["finish"] = Tool(
            func=lambda **kwargs: "Completed.",
            name="finish",
            desc="Signal task completion.",
            args={name: ... for name in signature.output_fields},
        )
        self.tools = tools

Two separate Predict instances with different signatures:

        # The action-selection signature
        instr = [
            signature.instructions,
            "You will be given `trajectory` as context.",
            f"Tools: {tool_descriptions}",
            "Finish with the `finish` tool when done.",
        ]
        react_signature = (
            dspy.Signature({**signature.input_fields}, "\n".join(instr))
            .append("trajectory", dspy.InputField(), type_=str)
            .append("next_thought", dspy.OutputField(), type_=str)
            .append("next_tool_name", dspy.OutputField(), type_=Literal[tuple(tools.keys())])
            .append("next_tool_args", dspy.OutputField(), type_=dict[str, Any])
        )

        # The extraction signature (uses ChainOfThought)
        fallback_signature = dspy.Signature(
            {**signature.input_fields, **signature.output_fields},
            signature.instructions,
        ).append("trajectory", dspy.InputField(), type_=str)

        self.react = dspy.Predict(react_signature)
        self.extract = dspy.ChainOfThought(fallback_signature)

The agent loop:

def forward(self, **input_args):
    trajectory = {}

    for idx in range(self.max_iters):
        # Ask the LM what to do next
        pred = self._call_with_potential_trajectory_truncation(
            self.react, trajectory, **input_args
        )

        # Record the action in trajectory
        trajectory[f"thought_{idx}"] = pred.next_thought
        trajectory[f"tool_name_{idx}"] = pred.next_tool_name
        trajectory[f"tool_args_{idx}"] = pred.next_tool_args

        # Actually execute the tool
        try:
            trajectory[f"observation_{idx}"] = self.tools[pred.next_tool_name](
                **pred.next_tool_args
            )
        except Exception as err:
            trajectory[f"observation_{idx}"] = f"Execution error: {_fmt_exc(err)}"

        # Break if finish tool was selected
        if pred.next_tool_name == "finish":
            break

    # Extract final answer from the full trajectory
    extract = self._call_with_potential_trajectory_truncation(
        self.extract, trajectory, **input_args
    )
    return dspy.Prediction(trajectory=trajectory, **extract)

Context window handling: _call_with_potential_trajectory_truncation retries up to 3 times on ContextWindowExceededError, each time truncating the oldest 4 trajectory entries (one tool call = thought + name + args + observation).

Parameters exposed to optimizers: Two Predict instances:

  • self.react -- the action-selection predictor
  • self.extract.predict -- the ChainOfThought's internal Predict for extraction

ProgramOfThought -- Code Generation + Execution

File: dspy/predict/program_of_thought.py

class ProgramOfThought(Module):
    def __init__(self, signature, max_iters=3, interpreter=None):
        super().__init__()
        self.signature = signature = ensure_signature(signature)
        self.input_fields = signature.input_fields
        self.output_fields = signature.output_fields

        # THREE separate ChainOfThought modules, each with a custom signature:

        # 1. Generate code from inputs
        self.code_generate = dspy.ChainOfThought(
            dspy.Signature(
                self._generate_signature("generate").fields,
                self._generate_instruction("generate")
            ),
        )

        # 2. Regenerate code given previous code + error
        self.code_regenerate = dspy.ChainOfThought(
            dspy.Signature(
                self._generate_signature("regenerate").fields,
                self._generate_instruction("regenerate")
            ),
        )

        # 3. Interpret code output into final answer
        self.generate_output = dspy.ChainOfThought(
            dspy.Signature(
                self._generate_signature("answer").fields,
                self._generate_instruction("answer")
            ),
        )

        self.interpreter = interpreter or PythonInterpreter()

The execution loop:

def forward(self, **kwargs):
    input_kwargs = {name: kwargs[name] for name in self.input_fields}

    # Step 1: Generate code
    code_data = self.code_generate(**input_kwargs)
    code, error = self._parse_code(code_data)
    if not error:
        output, error = self._execute_code(code)

    # Step 2: Retry on failure
    hop = 1
    while error is not None:
        if hop == self.max_iters:
            raise RuntimeError(f"Max iterations reached: {error}")
        input_kwargs.update({"previous_code": code, "error": error})
        code_data = self.code_regenerate(**input_kwargs)
        code, error = self._parse_code(code_data)
        if not error:
            output, error = self._execute_code(code)
        hop += 1

    # Step 3: Interpret code output
    input_kwargs.update({"final_generated_code": code, "code_output": output})
    return self.generate_output(**input_kwargs)

Signature generation (_generate_signature(mode)):

  • "generate": original inputs -> generated_code: str
  • "regenerate": original inputs + previous_code: str + error: str -> generated_code: str
  • "answer": original inputs + final_generated_code: str + code_output: str -> original outputs

Parameters exposed to optimizers: Three ChainOfThought modules, each with an internal Predict:

  • self.code_generate.predict
  • self.code_regenerate.predict
  • self.generate_output.predict

Pattern 3: Module Wrapping

BestOfN -- Rejection Sampling

File: dspy/predict/best_of_n.py

class BestOfN(Module):
    def __init__(self, module, N, reward_fn, threshold, fail_count=None):
        self.module = module
        self.N = N
        self.threshold = threshold
        self.fail_count = fail_count or N

        # IMPORTANT: wrapped in lambda to prevent named_parameters() from
        # discovering it (a raw function assigned to self would be walked)
        self.reward_fn = lambda *args: reward_fn(*args)
def forward(self, **kwargs):
    best_pred, best_score = None, float("-inf")
    fail_count = 0

    for i in range(self.N):
        with dspy.context(rollout_id=i, temperature=1.0):
            pred = self.module(**kwargs)
        score = self.reward_fn(kwargs, pred)

        if score > best_score:
            best_pred, best_score = pred, score
        if score >= self.threshold:
            return pred  # Good enough, return early
        fail_count += 1
        if fail_count >= self.fail_count:
            break

    return best_pred

Key behaviors:

  • Runs the wrapped module N times at temperature=1.0
  • Each run gets a unique rollout_id in the context
  • Returns the first prediction that meets the threshold, or the best overall
  • self.reward_fn is wrapped in a lambda specifically to prevent parameter discovery (otherwise named_parameters() would try to walk into it)

Parameters exposed to optimizers: Whatever self.module contains. BestOfN itself adds no Predict instances.

Refine -- BestOfN With Feedback

File: dspy/predict/refine.py

Refine does everything BestOfN does, plus: after a failed attempt, it generates per-module advice and injects it as a "hint" on retry.

The feedback mechanism: Uses dspy.Predict(OfferFeedback) to generate advice:

# OfferFeedback signature:
# input_data, output_data, metric_value, output_field_name -> feedback
feedback_pred = dspy.Predict(OfferFeedback)

The hint injection uses a WrapperAdapter:

class WrapperAdapter(adapter.__class__):
    def __call__(self, lm, lm_kwargs, signature, demos, inputs):
        # Dynamically add a hint field to the signature
        inputs["hint_"] = advice.get(signature2name[signature], "N/A")
        signature = signature.append(
            "hint_",
            InputField(desc="A hint to the module from an earlier run")
        )
        return adapter(lm, lm_kwargs, signature, demos, inputs)

This is the modern replacement for Assert/Suggest. Instead of backtracking and mutating signatures permanently, Refine:

  1. Runs the module
  2. If the metric fails, asks an LM for advice
  3. Injects that advice as a temporary "hint" field on the next attempt
  4. The signature modification happens at call time via the adapter wrapper, not at construction time

Pattern 4: Aggregation

majority() -- Voting

Not a module, just a function:

def majority(prediction_or_completions, normalize=...):
    """Returns the most common value across completions."""

MultiChainComparison (covered above)

Takes M completions and synthesizes them. This is aggregation via signature extension.


Deprecated / Removed Modules

Retry -- Removed

The entire file (dspy/predict/retry.py) is commented out. Not exported. Replaced by Refine and BestOfN.

Assert / Suggest -- Removed in DSPy 2.6

These were inline constraints that triggered backtracking:

# OLD (removed):
dspy.Assert(len(answer) < 100, "Answer too long")

When the constraint failed, it would dynamically modify the signature by adding past_{output_field} InputFields and a feedback InputField. On persistent failure, Assert raised an error; Suggest logged and continued.

Replaced by Refine which does the same thing more cleanly.

ChainOfThoughtWithHint -- Removed

Absorbed into Refine's hint injection mechanism.


Summary: What Each Module Exposes to Optimizers

Module # Predicts Paths What's Optimizable
Predict 1 self demos, signature.instructions, field prefixes
ChainOfThought 1 predict demos, instructions, reasoning prefix
MultiChainComparison 1 predict demos, instructions, rationale prefix
ReAct 2 react, extract.predict demos and instructions for both action selection and extraction
ProgramOfThought 3 code_generate.predict, code_regenerate.predict, generate_output.predict demos and instructions for code gen, code regen, and output interpretation
BestOfN varies whatever self.module contains pass-through to wrapped module
Refine varies + 1 wrapped module + feedback predictor pass-through + feedback generation

The invariant: Every optimizable thing is a Predict. Every Predict has a signature and demos. Modules are just orchestration.

Adapters: How Modules Talk to LMs

What Adapters Do

An adapter sits between Predict and the LM. It has three jobs:

  1. Format: Convert (signature, demos, inputs) into a list of chat messages
  2. Call: Send messages to the LM
  3. Parse: Extract typed output field values from the LM's text response

The critical path: Predict.forward() -> adapter(lm, lm_kwargs, signature, demos, inputs) -> messages -> LM -> completions -> parsed dicts -> Prediction.


1. Adapter Base Class

File: dspy/adapters/base.py

Constructor

class Adapter:
    def __init__(self, callbacks=None, use_native_function_calling=False,
                 native_response_types=None):
        self.callbacks = callbacks or []
        self.use_native_function_calling = use_native_function_calling
        self.native_response_types = native_response_types or [Citations, Reasoning]
  • use_native_function_calling: When True, detects dspy.Tool input fields and dspy.ToolCalls output fields, converts them to litellm tool definitions
  • native_response_types: Types handled by native LM features rather than text parsing (e.g., Reasoning for o1-style models)

The __call__ Pipeline

def __call__(self, lm, lm_kwargs, signature, demos, inputs):
    # Step 1: Preprocess - handle native tools and response types
    processed_signature, original_signature, lm_kwargs = self._call_preprocess(
        lm, lm_kwargs, signature, inputs
    )

    # Step 2: Format and call
    messages = self.format(processed_signature, demos, inputs)
    outputs = lm(messages=messages, **lm_kwargs)  # list[str | dict]

    # Step 3: Postprocess - parse each completion
    return self._call_postprocess(
        processed_signature, original_signature, outputs, lm, lm_kwargs
    )

Step 1: _call_preprocess()

Handles two categories of "native" features:

Native function calling (when use_native_function_calling=True):

  • Finds dspy.Tool / list[dspy.Tool] input fields
  • Finds dspy.ToolCalls output fields
  • Converts tools to litellm format via tool.format_as_litellm_function_call()
  • Adds to lm_kwargs["tools"]
  • Removes both tool input and ToolCalls output fields from the signature
  • The LM handles tool calling natively instead of through text

Native response types (Reasoning, Citations):

  • For each output field with a native response type annotation:
    • Calls field.annotation.adapt_to_native_lm_feature(signature, name, lm, lm_kwargs)
    • For Reasoning: checks if LM supports native reasoning (via litellm.supports_reasoning()). If yes, sets reasoning_effort in lm_kwargs and deletes the reasoning field from the signature. The model uses its built-in chain-of-thought.
    • Returns the modified signature (with native-handled fields removed)

Step 3: _call_postprocess()

For each LM output:

  1. If the output has text: call self.parse(processed_signature, text) -> dict of field values
  2. Set missing fields (ones in original but not processed signature) to None
  3. If tool_calls present: parse into ToolCalls.from_dict_list()
  4. For native response types: call field.annotation.parse_lm_response(output) (e.g., extract reasoning_content from the response dict)
  5. Handle logprobs

Abstract Methods (subclasses must implement)

def format_field_description(self, signature) -> str
def format_field_structure(self, signature) -> str
def format_task_description(self, signature) -> str
def format_user_message_content(self, signature, inputs, ...) -> str
def format_assistant_message_content(self, signature, outputs, ...) -> str
def parse(self, signature, completion) -> dict

Concrete Methods in Base

format(signature, demos, inputs) -- The main formatting pipeline:

def format(self, signature, demos, inputs):
    messages = []

    # 1. Check for History field; if present, extract conversation history
    history_field_name = ...  # find field with dspy.History type
    if history_field_name:
        signature = signature.delete(history_field_name)

    # 2. System message
    messages.append({
        "role": "system",
        "content": self.format_system_message(signature)
    })

    # 3. Demo messages (few-shot examples)
    messages.extend(self.format_demos(signature, demos))

    # 4. Conversation history (if any)
    if history_field_name:
        messages.extend(self.format_conversation_history(
            signature, history_field_name, inputs
        ))

    # 5. Current user input
    messages.append({
        "role": "user",
        "content": self.format_user_message_content(
            signature, inputs, main_request=True
        )
    })

    # 6. Handle custom types (Image, Audio, File)
    messages = split_message_content_for_custom_types(messages)

    return messages

format_system_message(signature):

def format_system_message(self, signature):
    return (
        self.format_field_description(signature) + "\n\n" +
        self.format_field_structure(signature) + "\n\n" +
        self.format_task_description(signature)
    )

format_demos(signature, demos) -- Sorts demos into complete and incomplete:

def format_demos(self, signature, demos):
    messages = []

    # Separate complete (all fields) from incomplete (some missing)
    complete_demos = [d for d in demos if all fields present]
    incomplete_demos = [d for d in demos if has_input AND has_output but not all]

    # Incomplete demos come FIRST with a disclaimer
    for demo in incomplete_demos:
        # User message with "This is an example of the task, though some input
        # or output fields are not supplied."
        # Missing fields show: "Not supplied for this particular example."

    # Complete demos after
    for demo in complete_demos:
        # User/assistant message pair with all fields filled

2. ChatAdapter

File: dspy/adapters/chat_adapter.py

The default adapter. Uses [[ ## field_name ## ]] delimiters to separate fields.

Fallback to JSONAdapter

def __call__(self, lm, lm_kwargs, signature, demos, inputs):
    try:
        return super().__call__(...)
    except Exception as e:
        if isinstance(e, ContextWindowExceededError):
            raise  # Don't retry context window errors
        if isinstance(self, JSONAdapter):
            raise  # Already in JSON mode
        if not self.use_json_adapter_fallback:
            raise
        # Fallback: retry with JSONAdapter
        return JSONAdapter()(lm, lm_kwargs, signature, demos, inputs)

format_field_description(signature)

Your input fields are:
1. `question` (str): The question to answer
2. `context` (list[str]): Relevant passages

Your output fields are:
1. `answer` (str): The answer, often between 1 and 5 words

format_field_structure(signature)

Shows the expected format using [[ ## field_name ## ]] markers:

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## context ## ]]
{context}

[[ ## answer ## ]]
{answer}    # note: the value you produce must be a single str value

[[ ## completed ## ]]

The type hints come from translate_field_type():

Python Type Prompt Hint
str (no hint)
bool "must be True or False"
int / float "must be a single int/float value"
Enum "must be one of: val1; val2; val3"
Literal["a", "b"] "must exactly match (no extra characters) one of: a; b"
Complex types "must adhere to the JSON schema: {...}" (Pydantic JSON schema)

format_task_description(signature)

In adhering to this structure, your objective is:
    Answer questions with short factoid answers.

format_user_message_content(signature, inputs, main_request=True)

[[ ## question ## ]]
What is the capital of France?

[[ ## context ## ]]
[1] <<France is a country in Western Europe. Its capital is Paris.>>

Respond with the corresponding output fields, starting with the field `[[ ## answer ## ]]`,
and then ending with the marker for `[[ ## completed ## ]]`.

The last line (output requirements) is only added when main_request=True (not for demos).

format_assistant_message_content(signature, outputs)

[[ ## answer ## ]]
Paris

[[ ## completed ## ]]

format_field_value() (from utils.py)

How values are formatted in messages:

  • Lists of strings: numbered format [1] <<text>>, [2] <<text>>
  • Dicts/lists of non-strings: json.dumps(jsonable_value)
  • Primitives: str(value)
  • Single items with delimiters: <<value>> or <<<multi\nline>>> for long values

parse(signature, completion)

def parse(self, signature, completion):
    # 1. Split on [[ ## field_name ## ]] headers
    sections = re.split(r"\[\[ ## (\w+) ## \]\]", completion)

    # 2. Group content under each header
    fields = {}
    for header, content in paired_sections:
        if header in signature.output_fields:
            fields[header] = content.strip()

    # 3. Parse each field value to its annotated type
    for name, raw_value in fields.items():
        annotation = signature.output_fields[name].annotation
        fields[name] = parse_value(raw_value, annotation)

    # 4. Validate all output fields are present
    if not all(name in fields for name in signature.output_fields):
        raise AdapterParseError(...)

    return fields

parse_value(value_string, annotation) (from utils.py):

  1. str -> return as-is
  2. Enum -> find matching member by value or name
  3. Literal -> validate against allowed values, strip wrapper syntax
  4. bool/int/float -> type cast
  5. Complex types -> json_repair.loads() then pydantic.TypeAdapter(annotation).validate_python()
  6. DSPy Type subclasses -> try custom parsing

3. JSONAdapter

File: dspy/adapters/json_adapter.py

Extends ChatAdapter. Key differences: outputs are JSON instead of delimited text.

Structured Outputs Support

def __call__(self, lm, lm_kwargs, signature, demos, inputs):
    # Try 1: json_object mode
    result = self._json_adapter_call_common(...)
    if result: return result

    try:
        # Try 2: OpenAI Structured Outputs (full schema)
        structured_output_model = _get_structured_outputs_response_format(signature)
        lm_kwargs["response_format"] = structured_output_model
        return super().__call__(...)
    except:
        # Try 3: json_object mode (simpler)
        lm_kwargs["response_format"] = {"type": "json_object"}
        return super().__call__(...)

Output Format Differences

ChatAdapter output:

[[ ## answer ## ]]
Paris

[[ ## completed ## ]]

JSONAdapter output:

{
  "answer": "Paris"
}

format_field_structure(signature) -- Different from ChatAdapter

User inputs still use [[ ## field_name ## ]] markers, but outputs are described as JSON:

Inputs will have the following structure:

[[ ## question ## ]]
{question}

Outputs will be a JSON object with the following fields.
{
  "answer": "{answer}"    // note: must adhere to JSON schema: ...
}

parse(signature, completion) -- JSON parsing

def parse(self, signature, completion):
    # 1. Parse with json_repair (handles malformed JSON)
    result = json_repair.loads(completion)

    # 2. If not a dict, try regex extraction of JSON object
    if not isinstance(result, dict):
        match = regex.search(r"\{(?:[^{}]|(?R))*\}", completion)
        result = json_repair.loads(match.group())

    # 3. Filter to known output fields
    result = {k: v for k, v in result.items() if k in signature.output_fields}

    # 4. Parse each value to its annotated type
    for name, value in result.items():
        result[name] = parse_value(value, signature.output_fields[name].annotation)

    # 5. Validate all fields present
    if not all(name in result for name in signature.output_fields):
        raise AdapterParseError(...)

    return result

Structured Outputs Model Generation

_get_structured_outputs_response_format(signature) builds a Pydantic model from output fields with OpenAI's requirements:

  • extra="forbid" (no additional properties)
  • Recursive enforce_required() ensures all nested objects have required and additionalProperties: false

4. Other Adapters

XMLAdapter

File: dspy/adapters/xml_adapter.py

Uses <field_name>...</field_name> XML tags instead of [[ ## ]] delimiters. Otherwise similar to ChatAdapter.

TwoStepAdapter

File: dspy/adapters/two_step_adapter.py

Uses two LM calls:

  1. First call: natural language prompt, get a free-form response
  2. Second call: use ChatAdapter to extract structured fields from the free-form response

Useful for models that struggle with strict formatting.


5. Complete Message Assembly Example

For a ChainOfThought("question -> answer") with 2 demos and the input "What is 2+2?":

System Message

Your input fields are:
1. `question` (str)

Your output fields are:
1. `reasoning` (str): ${reasoning}
2. `answer` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}

[[ ## completed ## ]]

In adhering to this structure, your objective is:
    Given the fields `question`, produce the fields `reasoning`, `answer`.

Demo 1 (User)

[[ ## question ## ]]
What is the capital of France?

Demo 1 (Assistant)

[[ ## reasoning ## ]]
The question asks about the capital of France. France is a country in Europe, and its capital city is Paris.

[[ ## answer ## ]]
Paris

[[ ## completed ## ]]

Demo 2 (User + Assistant)

(Same pattern)

Current Input (User)

[[ ## question ## ]]
What is 2+2?

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`,
and then ending with the marker for `[[ ## completed ## ]]`.

LM Response (Assistant)

[[ ## reasoning ## ]]
The question asks for the sum of 2 and 2. Basic arithmetic: 2 + 2 = 4.

[[ ## answer ## ]]
4

[[ ## completed ## ]]

Parsed Result

{"reasoning": "The question asks for the sum of 2 and 2. Basic arithmetic: 2 + 2 = 4.",
 "answer": "4"}

6. Settings and Adapter Configuration

Global Configuration

dspy.configure(
    lm=dspy.LM("openai/gpt-4"),
    adapter=dspy.ChatAdapter(),  # Default if not set
)

Per-Call Override

with dspy.context(adapter=dspy.JSONAdapter()):
    result = predict(question="...")

LM Resolution in Predict

# In _forward_preprocess:
adapter = settings.adapter or ChatAdapter()  # Global or default
lm = kwargs.pop("lm", self.lm) or settings.lm  # Per-call > per-predict > global

7. Custom Types and Special Handling

Image (dspy/adapters/types/image.py)

  • Subclass of dspy.Type
  • format() returns [{"type": "image_url", "image_url": {"url": data_uri}}]
  • Serialized with custom markers: <<CUSTOM-TYPE-START-IDENTIFIER>>json<<CUSTOM-TYPE-END-IDENTIFIER>>
  • split_message_content_for_custom_types() finds these markers and splits the user message into multimodal content blocks (text + image_url parts), matching OpenAI's multimodal message format

Reasoning (dspy/adapters/types/reasoning.py)

  • String-like custom type
  • adapt_to_native_lm_feature(): If LM supports native reasoning, sets reasoning_effort in lm_kwargs and removes the reasoning field from signature
  • parse_lm_response(): Extracts reasoning_content from the response dict
  • Falls back to text-based reasoning for non-reasoning models

Tool / ToolCalls (dspy/adapters/types/tool.py)

  • Handled in _call_preprocess: tools converted to litellm function calling format
  • Tool and ToolCalls fields removed from signature before formatting
  • In _call_postprocess: tool calls from LM response parsed back into ToolCalls objects

8. Adapter Summary Table

Adapter Input Format Output Format Fallback Native Structured
ChatAdapter [[ ## field ## ]] markers [[ ## field ## ]] markers Falls back to JSONAdapter on parse error No
JSONAdapter [[ ## field ## ]] markers JSON object Falls back to json_object mode Yes (OpenAI Structured Outputs)
XMLAdapter <field>...</field> tags <field>...</field> tags Inherits ChatAdapter fallback No
TwoStepAdapter Natural language Second LM call to extract ChatAdapter for extraction No

9. Key Files

File Role
dspy/adapters/base.py Abstract base, pipeline orchestration, demo formatting
dspy/adapters/chat_adapter.py Default adapter with [[ ## ]] delimiters
dspy/adapters/json_adapter.py JSON/structured output adapter
dspy/adapters/xml_adapter.py XML tag-based adapter
dspy/adapters/two_step_adapter.py Two-LM extraction adapter
dspy/adapters/utils.py format_field_value, parse_value, translate_field_type, serialize_for_json
dspy/adapters/types/base_type.py Type base class, multimodal content splitting
dspy/adapters/types/image.py Image type with base64 encoding
dspy/adapters/types/reasoning.py Native reasoning support
dspy/adapters/types/tool.py Native tool calling support

Optimizers: How They Discover and Modify Modules

The Contract

The implicit contract between an optimizer and a module:

  1. The module has Predict instances as leaf parameters. Discovered via named_parameters() / named_predictors(). A module with no Predict instances has nothing to optimize.
  2. Each Predict has a signature with mutable .instructions and field prefix/desc.
  3. Each Predict has a demos list (initially []). The primary optimization lever.
  4. Each Predict has an optional lm attribute. BootstrapFinetune replaces this with a finetuned model.
  5. Running the module records traces to settings.trace. Optimizers read traces to attribute outputs to specific predictors.
  6. Student and teacher must be structurally equivalent. Same number of predictors, same names, same signatures.
  7. deepcopy() and reset_copy() produce valid independent copies. Optimizers always copy before modifying.
  8. dump_state() / load_state() round-trip the optimized state.

1. Module Discovery

named_parameters() -- What Optimizers See

# For a program like:
class RAG(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Predict("question -> passages")
        self.answer = dspy.ChainOfThought("question, passages -> answer")

# named_parameters() returns:
[
    ("retrieve", <Predict>),          # self.retrieve IS a Parameter
    ("answer.predict", <Predict>),    # ChainOfThought holds self.predict
]

named_predictors() -- Convenience Filter

def named_predictors(self):
    from dspy.predict.predict import Predict
    return [(name, param) for name, param in self.named_parameters()
            if isinstance(param, Predict)]

Almost every optimizer uses this. Since Predict is currently the only Parameter subclass, named_parameters() and named_predictors() return the same things. But the filter makes the intent explicit.

predictor2name / name2predictor Mappings

Optimizers (especially BootstrapFewShot) build bidirectional maps to connect traces back to predictors:

# In BootstrapFewShot._prepare_predictor_mappings():
self.name2predictor = {}
self.predictor2name = {}
for name, predictor in self.student.named_predictors():
    self.name2predictor[name] = predictor
    self.predictor2name[id(predictor)] = name
# Same for teacher

id(predictor) is the key -- when a trace records (predictor_instance, inputs, prediction), the optimizer looks up predictor2name[id(predictor_instance)] to find which named predictor produced that output.


2. What Optimizers Modify

There are exactly four things optimizers touch on Predict instances:

Property Type Modified By Purpose
predictor.demos list[Example] BootstrapFewShot, MIPRO, RandomSearch, LabeledFewShot Few-shot examples prepended to prompt
predictor.signature.instructions str COPRO, MIPROv2 Task instruction text
predictor.signature field prefixes str COPRO Output field prefix text
predictor.lm LM BootstrapFinetune, BetterTogether The language model itself (finetuned)

Additionally, program._compiled = True is set by most optimizers after compilation.


3. The compile() Interface

# dspy/teleprompt/teleprompt.py
class Teleprompter:
    def compile(self, student: Module, *,
                trainset: list[Example],
                teacher: Module | None = None,
                valset: list[Example] | None = None,
                **kwargs) -> Module:

The contract:

  • Input: An uncompiled student Module and a trainset of Example objects
  • Output: A modified copy of the student with optimized parameters
  • Most optimizers deep-copy or reset_copy() the student first -- never mutating the original
  • student._compiled = True on the returned module
  • Same structure, but with modified demos/instructions/lm on its predictors

4. Tracing -- How Optimizers Observe Execution

How Tracing Works

  1. settings.trace is a global (thread-local) list, initialized via dspy.context(trace=[]).

  2. Every Predict._forward_postprocess() appends to this trace:

def _forward_postprocess(self, completions, signature, **kwargs):
    pred = Prediction.from_completions(completions, signature=signature)
    if settings.trace is not None and settings.max_trace_size > 0:
        trace = settings.trace
        if len(trace) >= settings.max_trace_size:
            trace.pop(0)
        trace.append((self, {**kwargs}, pred))
        # Tuple: (predictor_instance, input_kwargs_dict, prediction_output)
    return pred
  1. Optimizers capture traces by wrapping execution in a trace context:
# BootstrapFewShot:
with dspy.context(trace=[]):
    prediction = teacher(**example.inputs())
    trace = dspy.settings.trace
# trace is now [(pred1, inputs1, output1), (pred2, inputs2, output2), ...]
  1. Traces connect predictors to their I/O: The predictor_instance in the tuple lets optimizers map back to named predictors via predictor2name[id(predictor)].

  2. Metrics can use traces: Metric functions can accept an optional trace parameter:

def my_metric(example, prediction, trace=None):
    # Can inspect intermediate steps, not just final output

5. Key Optimizers

BootstrapFewShot (dspy/teleprompt/bootstrap.py)

The foundational optimizer. Populates demos on Predict instances by running a teacher and capturing successful traces.

Step 1: compile(student, *, teacher, trainset)

def compile(self, student, *, teacher=None, trainset):
    self.student = student.reset_copy()  # Deep copy + clear all demos
    self.teacher = (teacher or student).deepcopy()
    self._prepare_predictor_mappings()
    self._bootstrap()
    self._train()
    self.student._compiled = True
    return self.student

Step 2: _prepare_predictor_mappings()

  • Asserts student and teacher have identical structure (same number of predictors, same names)
  • Builds name2predictor and predictor2name for both

Step 3: _bootstrap() -- Generate Demo Candidates

For each training example:

for example in trainset:
    with dspy.context(trace=[]):
        prediction = self.teacher(**example.inputs())
        trace = dspy.settings.trace

    # Check if the output passes the metric
    if self.metric(example, prediction):
        # Extract demos from the trace
        for predictor, inputs, output in trace:
            name = self.predictor2name[id(predictor)]
            demo = dspy.Example(augmented=True, **inputs, **output)
            self.name2traces[name].append(demo)

The key mechanism: run the teacher, capture the trace, check the metric, and if it passes, create Example objects from each predictor's input/output pair.

Step 4: _train() -- Assign Demos to Student

For each student predictor:

for name, predictor in self.student.named_predictors():
    augmented_demos = self.name2traces[name][:self.max_bootstrapped_demos]
    raw_demos = self.raw_demos[name][:self.max_labeled_demos]
    predictor.demos = augmented_demos + raw_demos

augmented_demos are the bootstrapped ones (from successful teacher traces). raw_demos are unbootstrapped training examples.

BootstrapFewShotWithRandomSearch (dspy/teleprompt/random_search.py)

Runs BootstrapFewShot multiple times with different configurations and picks the best:

# Generates candidate programs with different strategies:
# Seed -3: Zero-shot (reset_copy, no demos)
# Seed -2: Labels only (LabeledFewShot)
# Seed -1: Unshuffled bootstrap
# Seeds 0+: Shuffled bootstrap with random demo count

# Evaluates each on validation set
# Returns the best-scoring program
# Attaches all candidates as best_program.candidate_programs

MIPROv2 (dspy/teleprompt/mipro_optimizer_v2.py)

The most sophisticated optimizer. Jointly optimizes instructions AND demos using Bayesian optimization (Optuna).

Three-phase process:

Phase 1: Bootstrap few-shot examples (_bootstrap_fewshot_examples)

  • Uses create_n_fewshot_demo_sets() which internally runs multiple BootstrapFewShot compilations
  • Produces demo_candidates[i] -- a list of demo sets for each predictor i

Phase 2: Propose instruction candidates (_propose_instructions)

  • Uses GroundedProposer -- an LM-based instruction generator
  • Can be program-aware (reads source code), data-aware (summarizes training data), tip-aware (includes prompting tips), fewshot-aware (includes example demos)
  • Produces instruction_candidates[i] -- a list of instruction strings for each predictor i

Phase 3: Bayesian optimization (_optimize_prompt_parameters)

# Uses Optuna TPE sampler
for trial in optuna_study:
    # For each predictor i:
    instruction_idx = trial.suggest_categorical(f"instruction_{i}", range(n_candidates))
    demos_idx = trial.suggest_categorical(f"demos_{i}", range(n_demo_sets))

    # Apply instruction
    updated_sig = predictor.signature.with_instructions(
        instruction_candidates[i][instruction_idx]
    )
    set_signature(predictor, updated_sig)

    # Apply demos
    predictor.demos = demo_candidates[i][demos_idx]

    # Evaluate the assembled program
    score = evaluate(program, devset=minibatch)
    # Optuna learns which combinations work best

COPRO (dspy/teleprompt/copro_optimizer.py)

Pure instruction optimization (no demo manipulation):

for predictor in program.predictors():
    # Generate candidate instructions using an LM
    for breadth iterations:
        candidates = generate_instruction_candidates(current_instruction)

    # Evaluate each candidate
    for candidate in candidates:
        updated_sig = signature.with_instructions(candidate.instruction)
        updated_sig = updated_sig.with_updated_fields(last_key, prefix=candidate.prefix)
        set_signature(predictor, updated_sig)
        score = evaluate(program)

    # Iterate for depth rounds, feeding previous attempts and scores

Modifies both signature.instructions and the last output field's prefix.

BootstrapFinetune (dspy/teleprompt/bootstrap_finetune.py)

Fundamentally different: modifies model weights rather than the prompt.

Step 1: bootstrap_trace_data() -- Run teacher on training set with tracing:

for example in trainset:
    with dspy.context(trace=[]):
        prediction = program(**example.inputs())
        trace = dspy.settings.trace
    score = metric(example, prediction)
    trace_data.append({example, prediction, trace, score})

Step 2: _prepare_finetune_data() -- Convert traces to training format:

for trace_entry in trace_data:
    for pred, inputs, outputs in trace_entry.trace:
        # Use the adapter to format as training data
        training_example = adapter.format_finetune_data(
            signature, demos, inputs, outputs
        )
        # This produces chat-format messages suitable for finetuning

Step 3: finetune_lms() -- Group predictors by LM, finetune:

# If multitask=True: all predictors sharing an LM get one combined finetune job
finetuned_lm = lm.finetune(train_data, ...)

Step 4: Update predictor LMs:

for predictor in group:
    predictor.lm = finetuned_lm

BetterTogether (dspy/teleprompt/bettertogether.py)

Composes prompt optimization and weight optimization in a configurable sequence:

strategy = "p -> w -> p"  # prompt, weight, prompt

# p step: BootstrapFewShotWithRandomSearch
# w step: BootstrapFinetune

for step in strategy:
    if step == "p":
        student = prompt_optimizer.compile(student, trainset=trainset)
    elif step == "w":
        student = weight_optimizer.compile(student, trainset=trainset)
    # Reset _compiled=False for next round, preserve LMs

6. How Evaluate Works

File: dspy/evaluate/evaluate.py

class Evaluate:
    def __call__(self, program, metric=None, devset=None, ...) -> EvaluationResult:
        def process_item(example):
            prediction = program(**example.inputs())
            score = metric(example, prediction)
            return prediction, score

        results = executor.execute(process_item, devset)
        # results: list of (prediction, score) per example

        ncorrect = sum(score for *_, score in results)
        return EvaluationResult(
            score=100 * ncorrect / ntotal,
            results=results
        )
  • Uses ParallelExecutor for multi-threaded evaluation
  • For each example: calls program(**example.inputs()), then metric(example, prediction)
  • EvaluationResult (subclass of Prediction) has .score (percentage) and .results (list of (example, prediction, score))
  • failure_score is used when evaluation fails for an example

7. The Optimization Surface

Putting it all together, here's what the optimization surface looks like for a typical program:

class RAG(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Predict("question -> passages")
        self.answer = dspy.ChainOfThought("question, passages -> answer")

Discoverable parameters (via named_predictors()):

  1. "retrieve" -- Predict with signature "question -> passages"
  2. "answer.predict" -- Predict with signature "question, passages -> reasoning, answer"

Per-predictor optimization knobs:

Knob What Who Modifies How
demos Few-shot examples BootstrapFewShot, MIPRO predictor.demos = [Example(...), ...]
signature.instructions Task description COPRO, MIPRO signature.with_instructions("...")
Field prefix Output field label COPRO signature.with_updated_fields(name, prefix="...")
Field desc Field description (rarely modified) signature.with_updated_fields(name, desc="...")
lm The language model BootstrapFinetune predictor.lm = finetuned_lm

What gets saved/loaded:

When you program.save("path.json"), it serializes:

{
    "retrieve": {
        "demos": [...],
        "traces": [],
        "train": [],
        "signature": {
            "instructions": "Given the fields `question`, produce the fields `passages`.",
            "fields": {
                "question": {"prefix": "Question:", "desc": "${question}"},
                "passages": {"prefix": "Passages:", "desc": "${passages}"}
            }
        },
        "lm": null
    },
    "answer.predict": {
        "demos": [...],
        "traces": [],
        "train": [],
        "signature": {
            "instructions": "Optimized instruction here...",
            "fields": {
                "question": {"prefix": "Question:", "desc": "${question}"},
                "passages": {"prefix": "Passages:", "desc": "${passages}"},
                "reasoning": {"prefix": "Reasoning:", "desc": "${reasoning}"},
                "answer": {"prefix": "Answer:", "desc": "${answer}"}
            }
        },
        "lm": null
    }
}

The architecture (which modules exist, how they're connected) comes from code. The optimized state (demos, instructions, field metadata) comes from the saved file.

Rust Rewrite Implications

What DSPy's Module System Actually Is

Strip away the Python dynamism and DSPy's module system is:

  1. A tree of composable nodes where leaf nodes (Predict) hold optimizable state
  2. A typed I/O contract (Signature) that describes what goes in and what comes out
  3. A formatting/parsing layer (Adapter) that converts typed contracts to LM prompts and back
  4. A tree traversal that lets optimizers discover and modify leaf nodes
  5. A tracing mechanism that records execution for optimizer feedback

That's it. Everything else is orchestration (how modules compose Predicts) and strategy (how optimizers search the space).


The Hard Problems

1. Dynamic Signature Manipulation

In Python, signatures are classes created at runtime via metaclass magic. Modules like ChainOfThought do signature.prepend("reasoning", OutputField(...)) which creates a new type at runtime.

In Rust: Signatures are data, not types. Model them as:

struct Signature {
    name: String,
    instructions: String,
    fields: IndexMap<String, Field>,  // Ordered map (insertion order matters)
}

struct Field {
    direction: FieldDirection,  // Input | Output
    type_annotation: TypeAnnotation,
    prefix: String,
    desc: String,
    format: Option<Box<dyn Fn(&str) -> String>>,
    constraints: Option<String>,
}

enum FieldDirection {
    Input,
    Output,
}

enum TypeAnnotation {
    Str,
    Int,
    Float,
    Bool,
    List(Box<TypeAnnotation>),
    Dict(Box<TypeAnnotation>, Box<TypeAnnotation>),
    Optional(Box<TypeAnnotation>),
    Enum(Vec<String>),
    Literal(Vec<String>),
    Json(serde_json::Value),  // For complex types, store JSON schema
}

All manipulation methods (with_instructions, prepend, append, delete, with_updated_fields) return new Signature values. This maps cleanly to Rust's ownership model -- signatures are cheap to clone and manipulate.

2. The Parameter Tree Walk

Python does this by walking __dict__ and checking isinstance. Rust doesn't have runtime reflection.

Options:

Option A: Explicit children (recommended)

trait Module {
    fn forward(&self, inputs: HashMap<String, Value>) -> Result<Prediction>;
    fn named_parameters(&self) -> Vec<(String, &dyn Parameter)>;
    fn named_sub_modules(&self) -> Vec<(String, &dyn Module)>;
}

trait Parameter: Module {
    fn demos(&self) -> &[Example];
    fn set_demos(&mut self, demos: Vec<Example>);
    fn signature(&self) -> &Signature;
    fn set_signature(&mut self, sig: Signature);
    fn dump_state(&self) -> serde_json::Value;
    fn load_state(&mut self, state: &serde_json::Value);
    fn reset(&mut self);
}

Each module explicitly returns its children. ChainOfThought returns [("predict", &self.predict)]. ReAct returns [("react", &self.react), ("extract.predict", &self.extract.predict)].

Option B: Derive macro

#[derive(DspyModule)]
struct ChainOfThought {
    #[parameter]
    predict: Predict,
}

A proc macro generates named_parameters() by inspecting fields marked with #[parameter].

Option C: Inventory/registry -- each module registers itself. More complex, probably overkill.

Recommendation: Start with Option A (explicit). It's simple, correct, and makes the tree structure obvious. Add a derive macro later if the boilerplate becomes painful.

3. The _compiled Freeze Flag

In Python, _compiled = True makes named_parameters() skip a sub-module. In Rust:

Simple approach: A boolean flag on every module, checked in named_parameters().

Type-state approach (more Rusty):

struct CompiledModule<M: Module> {
    inner: M,
    // named_parameters() returns empty vec
    // Cannot be modified without explicitly un-compiling
}

impl<M: Module> Module for CompiledModule<M> {
    fn named_parameters(&self) -> Vec<(String, &dyn Parameter)> {
        vec![]  // Frozen -- parameters are not exposed
    }
    fn forward(&self, inputs: HashMap<String, Value>) -> Result<Prediction> {
        self.inner.forward(inputs)
    }
}

4. The Adapter System

Adapters are the most straightforward part to port. They're essentially:

  • Template formatting (building message strings from signature + demos + inputs)
  • Regex-based parsing (splitting LM output by [[ ## field ## ]] markers)
  • Type coercion (parsing strings into typed values)
trait Adapter {
    fn format(&self, sig: &Signature, demos: &[Example], inputs: &HashMap<String, Value>) -> Vec<Message>;
    fn parse(&self, sig: &Signature, completion: &str) -> Result<HashMap<String, Value>>;
}

struct ChatAdapter;
struct JsonAdapter;

The fallback pattern (ChatAdapter -> JSONAdapter on parse failure) is just:

impl Adapter for ChatAdapter {
    fn call(&self, lm: &LM, sig: &Signature, demos: &[Example], inputs: &HashMap<String, Value>) -> Result<Vec<HashMap<String, Value>>> {
        match self.try_call(lm, sig, demos, inputs) {
            Ok(result) => Ok(result),
            Err(e) if !e.is_context_window_error() => {
                JsonAdapter.call(lm, sig, demos, inputs)
            }
            Err(e) => Err(e),
        }
    }
}

5. Tracing

Python uses a global thread-local list that Predicts append to. In Rust:

// Thread-local trace context
thread_local! {
    static TRACE: RefCell<Option<Vec<TraceEntry>>> = RefCell::new(None);
}

struct TraceEntry {
    predictor_id: PredictorId,  // Not a reference -- an ID for lookup
    inputs: HashMap<String, Value>,
    prediction: Prediction,
}

// In Predict::forward:
TRACE.with(|trace| {
    if let Some(ref mut trace) = *trace.borrow_mut() {
        trace.push(TraceEntry { predictor_id: self.id, inputs, prediction });
    }
});

// In optimizer:
let trace = with_trace(|| teacher.forward(example.inputs()));

Use IDs instead of references. Python uses id(predictor) (memory address); Rust should use a stable identifier (UUID, path string, or index).

6. Value Types and Parsing

DSPy uses Python's dynamic typing + Pydantic for validation. In Rust, you need a value type:

enum Value {
    Str(String),
    Int(i64),
    Float(f64),
    Bool(bool),
    List(Vec<Value>),
    Dict(HashMap<String, Value>),
    Null,
    Json(serde_json::Value),  // For complex/unknown types
}

Parsing (parse_value equivalent):

fn parse_value(raw: &str, annotation: &TypeAnnotation) -> Result<Value> {
    match annotation {
        TypeAnnotation::Str => Ok(Value::Str(raw.to_string())),
        TypeAnnotation::Int => raw.parse::<i64>().map(Value::Int),
        TypeAnnotation::Bool => parse_bool(raw),
        TypeAnnotation::Enum(variants) => parse_enum(raw, variants),
        TypeAnnotation::Literal(allowed) => parse_literal(raw, allowed),
        TypeAnnotation::Json(schema) => {
            let v: serde_json::Value = serde_json::from_str(raw)?;
            // Validate against schema
            Ok(Value::Json(v))
        }
        // ...
    }
}

What to Build First

Phase 1: Core Primitives

  1. Signature struct with manipulation methods
  2. Field and TypeAnnotation
  3. Value enum for dynamic values
  4. Example and Prediction data containers

Phase 2: Module System

  1. Module trait with forward() and named_parameters()
  2. Parameter trait extending Module
  3. Predict implementing both
  4. BaseModule trait for tree traversal, serialization

Phase 3: Adapter Layer

  1. Adapter trait
  2. ChatAdapter (formatting and parsing)
  3. JsonAdapter
  4. parse_value for type coercion

Phase 4: Composition Modules

  1. ChainOfThought (signature extension pattern)
  2. ReAct (multi-signature orchestration pattern)
  3. BestOfN / Refine (module wrapping pattern)

Phase 5: Optimization

  1. Tracing infrastructure
  2. Evaluate
  3. BootstrapFewShot
  4. LabeledFewShot
  5. More complex optimizers as needed

Design Decisions to Make Early

1. Static vs Dynamic Signatures

Python signatures carry Python types (Pydantic models, etc.). Rust signatures will need to decide:

  • Fully dynamic (TypeAnnotation enum + Value enum) -- flexible, similar to Python, but loses Rust's type safety
  • Partially typed (generics for common cases, Value for complex) -- more Rusty but more complex
  • Schema-driven (JSON Schema as the universal type description) -- pragmatic, works with any LM

Recommendation: Start fully dynamic. The type safety that matters here is at the LM boundary (parsing), not at compile time. You're dealing with strings from an LM no matter what.

2. Ownership of Demos and Signatures

In Python, optimizers freely mutate predictor.demos and predictor.signature. In Rust:

  • Mutable references: Optimizers take &mut references to the program
  • Interior mutability: Use RefCell<Vec<Example>> for demos
  • Clone + replace: Clone the whole program, modify the clone, return it (matches Python's reset_copy() pattern)

Recommendation: Clone + replace. It matches the Python pattern where optimizers always copy the student first, and it avoids fighting the borrow checker.

3. Async vs Sync

LM calls are inherently async (HTTP requests). The question is whether forward() should be async.

Recommendation: Make it async from the start. async fn forward(&self, ...) -> Result<Prediction>. Easier than retrofitting later.

4. Error Types

DSPy uses AdapterParseError, ContextWindowExceededError, and generic exceptions. Design a clean error enum:

enum DspyError {
    ParseError { adapter: String, raw: String, partial: HashMap<String, Value> },
    ContextWindowExceeded { model: String, token_count: usize },
    MissingInput { field: String },
    LmError(Box<dyn std::error::Error>),
    // ...
}

What NOT to Port

  1. The metaclass machinery (ProgramMeta, SignatureMeta). These exist to paper over Python's limitations. Rust structs with derive macros are cleaner.

  2. magicattr (AST-based nested attribute access). In Rust, named_parameters returns paths; use them to index directly.

  3. __getattribute__ forward-call guard. In Rust, make forward() private and only expose call().

  4. Dynamic __dict__ walking. Replace with explicit trait implementations.

  5. cloudpickle serialization. Use serde with JSON/MessagePack. The "save whole program" feature is Python-specific.

  6. The Settings singleton. Use explicit context passing or a structured configuration type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment