You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Written for the oxide Rust rewrite. Self-contained -- no DSPy source access required.
What DSPy Is (In One Paragraph)
DSPy is a framework for programming with language models where you declare what you want (via typed signatures), not how to prompt. The framework handles prompt construction, output parsing, and -- critically -- automatic optimization of prompts and few-shot examples. The module system is the backbone that makes all of this possible.
The Core Insight
Everything in DSPy is built on a single primitive: Predict. A Predict takes a typed signature (input fields -> output fields), formats it into a prompt via an adapter, calls an LM, and parses the response back into typed outputs. Every higher-level module (ChainOfThought, ReAct, ProgramOfThought) is just orchestration on top of one or more Predict instances.
Optimizers work by discovering all Predict instances in a module tree, then modifying their demos (few-shot examples) and signature instructions (the task description). This is the entire optimization surface.
Architecture Diagram
User Program (a Module subclass)
|
|-- Module.__call__()
| |-- callbacks, usage tracking, caller stack
| |-- self.forward(**kwargs)
|
|-- Contains Predict instances (the leaf parameters)
| |-- Each Predict has:
| | signature (Signature class -- typed I/O contract)
| | demos (list[Example] -- few-shot examples)
| | lm (optional per-predictor LM override)
| | config (LM kwargs: temperature, n, etc.)
| |
| |-- Predict.forward():
| | 1. _forward_preprocess: resolve LM, merge config, get demos
| | 2. adapter(lm, signature, demos, inputs)
| | 3. _forward_postprocess: build Prediction, append to trace
| |
| |-- Adapter pipeline:
| format(signature, demos, inputs) -> messages
| lm(messages, **kwargs) -> completions
| parse(signature, completion) -> dict of output fields
|
|-- named_parameters() walks the tree, finds all Predict instances
|-- Optimizers modify demos/instructions on discovered Predicts
|-- save()/load() serializes the optimized state
Predict inherits from both Module and Parameter, making it both callable and optimizable.
1. Parameter: The Marker
# dspy/predict/parameter.pyclassParameter:
pass
That's the entire class. No methods, no state. It exists so isinstance(obj, Parameter) can distinguish "things optimizers can tune" from "things that are just structural." In the current codebase, Predict is the only class that inherits from Parameter.
Why this matters: When BaseModule.named_parameters() walks the object graph, it collects everything that passes isinstance(value, Parameter). Since only Predict does, optimizers only ever see Predict instances. Higher-level modules (ChainOfThought, ReAct) are invisible to optimizers -- they're just containers that hold Predict instances.
2. BaseModule: The Tree
BaseModule provides the infrastructure for treating a module hierarchy as a traversable tree.
2.1 named_parameters() -- DFS Parameter Discovery
This is the most important method in the entire module system. Every optimizer calls it.
defnamed_parameters(self):
""" DFS walk of self.__dict__. Finds all Parameter instances (i.e., Predict objects). Returns list of (dotted_path_string, Parameter_instance) tuples. Rules: - If self is a Parameter, includes ("self", self) - Parameter instances in __dict__ -> added directly - Module instances in __dict__ -> recurse (unless _compiled=True) - Lists/tuples -> iterate with indexed names: "name[0]", "name[1]" - Dicts -> iterate with keyed names: "name['key']" - Tracks visited set by id() to handle diamond DAGs (same object reachable via multiple paths) """importdspyfromdspy.predict.parameterimportParametervisited=set()
named_parameters= []
defadd_parameter(param_name, param_value):
ifisinstance(param_value, Parameter):
ifid(param_value) notinvisited:
visited.add(id(param_value))
named_parameters.append((param_name, param_value))
elifisinstance(param_value, dspy.Module):
# CRITICAL: _compiled modules are FROZEN -- we don't recurse into them.# This is how pre-optimized sub-modules keep their state.ifnotgetattr(param_value, "_compiled", False):
forsub_name, paraminparam_value.named_parameters():
add_parameter(f"{param_name}.{sub_name}", param)
ifisinstance(self, Parameter):
add_parameter("self", self)
forname, valueinself.__dict__.items():
ifisinstance(value, Parameter):
add_parameter(name, value)
elifisinstance(value, dspy.Module):
ifnotgetattr(value, "_compiled", False):
forsub_name, paraminvalue.named_parameters():
add_parameter(f"{name}.{sub_name}", param)
elifisinstance(value, (list, tuple)):
foridx, iteminenumerate(value):
add_parameter(f"{name}[{idx}]", item)
elifisinstance(value, dict):
forkey, iteminvalue.items():
add_parameter(f"{name}['{key}']", item)
returnnamed_parameters
[
("cot.predict", <Predict instance>), # ChainOfThought holds self.predict
("summarize", <Predict instance>), # Predict IS a Parameter
]
The dotted path names are how optimizers map traces back to specific predictors and how save()/load() serialize state.
2.2 named_sub_modules() -- BFS Module Discovery
defnamed_sub_modules(self, type_=None, skip_compiled=False):
""" BFS traversal of ALL BaseModule instances in the tree. Different from named_parameters: - BFS not DFS - Returns ALL modules, not just Parameters - Optional type filter and compiled-skip flag """iftype_isNone:
type_=BaseModulequeue=deque([("self", self)])
seen= {id(self)}
defadd_to_queue(name, item):
ifid(item) notinseen:
seen.add(id(item))
queue.append((name, item))
whilequeue:
name, item=queue.popleft()
ifisinstance(item, type_):
yieldname, itemifisinstance(item, BaseModule):
ifskip_compiledandgetattr(item, "_compiled", False):
continueforsub_name, sub_iteminitem.__dict__.items():
add_to_queue(f"{name}.{sub_name}", sub_item)
elifisinstance(item, (list, tuple)):
fori, sub_iteminenumerate(item):
add_to_queue(f"{name}[{i}]", sub_item)
elifisinstance(item, dict):
forkey, sub_iteminitem.items():
add_to_queue(f"{name}[{key}]", sub_item)
2.3 deepcopy() -- Safe Deep Copying
defdeepcopy(self):
""" Strategy: 1. Try copy.deepcopy(self) -- works if all attributes are picklable 2. If that fails, manual fallback: - Create empty instance via __new__ (no __init__) - For each attr in __dict__: - BaseModule -> recursive deepcopy() - Other -> try deepcopy, fallback copy.copy, fallback reference """try:
returncopy.deepcopy(self)
exceptException:
passnew_instance=self.__class__.__new__(self.__class__)
forattr, valueinself.__dict__.items():
ifisinstance(value, BaseModule):
setattr(new_instance, attr, value.deepcopy())
else:
try:
setattr(new_instance, attr, copy.deepcopy(value))
exceptException:
try:
setattr(new_instance, attr, copy.copy(value))
exceptException:
setattr(new_instance, attr, value)
returnnew_instance
Why the fallback matters: Some modules hold references to non-picklable objects (LM connections, thread pools). The manual fallback ensures the module tree is still copyable even when copy.deepcopy chokes.
2.4 reset_copy() -- Fresh Copy for Optimization
defreset_copy(self):
"""Deep copy, then reset() every parameter. Creates a fresh copy with architecture intact but all learned state cleared. Used by optimizers to create candidate programs."""new_instance=self.deepcopy()
forparaminnew_instance.parameters():
param.reset()
returnnew_instance
param.reset() on a Predict clears self.lm, self.traces, self.train, and self.demos. The architecture (signature, config) is preserved; the learned state is wiped.
demos (few-shot examples, serialized via serialize_object for JSON safety)
signature state (instructions + field prefixes/descriptions)
lm state (model config) or None
2.6 save() / load() -- File I/O
Two modes:
State-only (default): Saves just the optimized state (demos, instructions, etc.) to .json or .pkl.
defsave(self, path, save_program=False):
# state = self.dump_state() + metadata (python/dspy/cloudpickle versions)# Write to JSON or pickle based on file extension
Full program (save_program=True): Uses cloudpickle to serialize the entire module object (architecture + state) to a directory containing program.pkl + metadata.json.
load() reads state and calls self.load_state(state). Note: this loads state into an existing module. For loading a whole program from pickle, there's a separate dspy.load() function.
3. Module: The Call Protocol
Module extends BaseModule with the call/forward protocol, a metaclass that ensures safe initialization, and convenience methods.
3.1 ProgramMeta -- The Metaclass
classProgramMeta(type):
"""Ensures _base_init runs BEFORE __init__, even if subclass forgets super().__init__(). When you do MyModule(args): 1. __new__ creates the instance (no __init__ yet) 2. Module._base_init(obj) -- sets _compiled, callbacks, history 3. cls.__init__(obj, args) -- the user's actual __init__ 4. Safety: ensures callbacks and history exist even if __init__ didn't set them """def__call__(cls, *args, **kwargs):
obj=cls.__new__(cls, *args, **kwargs)
ifisinstance(obj, cls):
Module._base_init(obj)
cls.__init__(obj, *args, **kwargs)
ifnothasattr(obj, "callbacks"):
obj.callbacks= []
ifnothasattr(obj, "history"):
obj.history= []
returnobj
Why this exists: If a user writes class MyModule(dspy.Module) and forgets super().__init__(), the module would lack _compiled, callbacks, and history. The metaclass guarantees these always exist.
3.2 Module Attributes
classModule(BaseModule, metaclass=ProgramMeta):
def_base_init(self):
self._compiled=False# Has this module been optimized?self.callbacks= [] # List of BaseCallback instancesself.history= [] # LM call historydef__init__(self, callbacks=None):
self.callbacks=callbacksor []
self._compiled=Falseself.history= []
3.3 __call__() -- The Central Dispatch
@with_callbacks# Wraps with on_module_start / on_module_end callbacksdef__call__(self, *args, **kwargs):
""" 1. Get caller_modules stack from settings (tracks nested module calls) 2. Append self to the stack 3. In a settings.context with updated caller_modules: a. If usage tracking enabled and no tracker yet, create one b. Call self.forward(*args, **kwargs) c. If tracking, attach token usage to the Prediction 4. Return the Prediction """caller_modules=settings.caller_modulesor []
caller_modules=list(caller_modules)
caller_modules.append(self)
withsettings.context(caller_modules=caller_modules):
ifsettings.track_usageandno_tracker_yet:
withtrack_usage() asusage_tracker:
output=self.forward(*args, **kwargs)
tokens=usage_tracker.get_total_tokens()
self._set_lm_usage(tokens, output)
returnoutputreturnself.forward(*args, **kwargs)
__call__ vs forward(): __call__ is the public entry point. It handles callbacks, usage tracking, and the module call stack. forward() is the actual logic that subclasses override. There is a __getattribute__ override that warns if you call .forward() directly (it inspects the call stack):
def__getattribute__(self, name):
attr=super().__getattribute__(name)
ifname=="forward"andcallable(attr):
stack=inspect.stack()
forward_called_directly=len(stack) <=1orstack[1].function!="__call__"ifforward_called_directly:
logger.warning("Calling module.forward() directly is discouraged. Use module() instead.")
returnattr
3.4 Pickle Support
def__getstate__(self):
"""Excludes history and callbacks (transient state) from pickle"""state=self.__dict__.copy()
state.pop("history", None)
state.pop("callbacks", None)
returnstatedef__setstate__(self, state):
"""Restores history and callbacks as empty on unpickle"""self.__dict__.update(state)
ifnothasattr(self, "history"):
self.history= []
ifnothasattr(self, "callbacks"):
self.callbacks= []
3.5 Convenience Methods
defnamed_predictors(self):
"""Filters named_parameters() to only Predict instances"""fromdspy.predict.predictimportPredictreturn [(name, param) forname, paraminself.named_parameters()
ifisinstance(param, Predict)]
defpredictors(self):
"""Just the Predict objects, no names"""return [paramfor_, paraminself.named_predictors()]
defset_lm(self, lm):
"""Sets the LM on ALL predictors in the tree"""for_, paraminself.named_predictors():
param.lm=lmdefget_lm(self):
"""Returns the LM if all predictors share one, raises if they differ"""defmap_named_predictors(self, func):
"""Applies func to each predictor and replaces it in the tree. Uses magicattr.set for nested path assignment (handles dotted paths)."""forname, predictorinself.named_predictors():
set_attribute_by_name(self, name, func(predictor))
returnself
4. The _compiled Flag
_compiled is a boolean that controls optimizer traversal:
Initialized to False on every new Module (via _base_init)
Set to True by optimizers after compilation (e.g., student._compiled = True)
When True, named_parameters()stops recursing into this module -- its Predict instances are invisible to further optimization
This is how you compose pre-optimized modules: a compiled sub-module's demos and signature instructions won't be overwritten by a parent optimizer
Example:
# Pre-optimize a sub-moduleoptimized_qa=bootstrap.compile(qa_module, trainset=data)
# optimized_qa._compiled is now True# Use it in a larger programclassPipeline(dspy.Module):
def__init__(self):
self.retrieve=dspy.Predict("query -> passages")
self.qa=optimized_qa# _compiled=True, frozen# When a parent optimizer runs on Pipeline:# named_parameters() finds: [("retrieve", <Predict>)]# It does NOT find optimized_qa's internal Predict -- it's frozen.
The dual inheritance of Predict is the key design decision: It is both a Module (callable, composable, has forward()) and a Parameter (discoverable by optimizers). Everything else in the system follows from this.
A Signature is a typed contract between a module and an LM: named input fields -> named output fields, with instructions. It's the thing that makes DSPy declarative -- you say "question -> answer" and the framework handles prompt construction, output parsing, and type validation.
Critical implementation detail: A Signature is a class, not an instance. When you write dspy.Signature("question -> answer"), you get back a new type (a dynamically-created Pydantic BaseModel subclass), not an object. Operations like prepend, with_instructions, delete all return new classes. This is metaclass-heavy Python.
move_kwargs separates DSPy-specific arguments from Pydantic-native arguments:
DSPy-specific (stored in json_schema_extra):
Argument
Type
Purpose
__dspy_field_type
"input" or "output"
The discriminator -- how the system tells inputs from outputs
desc
str
Field description shown to the LM in the prompt
prefix
str
Prompt prefix for this field (e.g., "Question:")
format
callable
Optional formatting function
parser
callable
Optional parsing function
constraints
str
Human-readable constraint strings
Pydantic-native (passed through to pydantic.Field):
Argument
Purpose
gt, ge, lt, le
Numeric constraints
min_length, max_length
String/collection length
default
Default value
Constraint translation: Pydantic constraints are automatically converted to human-readable strings. OutputField(ge=5, le=10) generates constraints="greater than or equal to: 5, less than or equal to: 10" which gets included in the prompt so the LM knows the bounds.
3. SignatureMeta: The Metaclass
SignatureMeta extends type(BaseModel) (Pydantic's metaclass). It does three key things:
3.1 __call__ -- String Shorthand Interception
classSignatureMeta(type(BaseModel)):
def__call__(cls, *args, **kwargs):
# If called with a string like Signature("question -> answer"),# route to make_signature() to create a new class (not instance)ifclsisSignature:
iflen(args) ==1andisinstance(args[0], (str, dict)):
returnmake_signature(args[0], kwargs.pop("instructions", None))
# Otherwise, create an actual instance (rare in normal DSPy usage)returnsuper().__call__(*args, **kwargs)
This means dspy.Signature("question -> answer") returns a new class, not an instance.
3.2 __new__ -- Class Creation
When a Signature class is being defined (either via class QA(dspy.Signature) or via make_signature()):
def__new__(mcs, signature_name, bases, namespace):
# 1. Set str as default type for fields without annotationsfornameinnamespace:
ifnamenotinannotations:
annotations[name] =str# 2. Preserve field ordering: inputs before outputs# (reorder annotations dict to match declaration order)# 3. Let Pydantic create the classcls=super().__new__(mcs, signature_name, bases, namespace)
# 4. Set default instructions if none givenifnotcls.__doc__:
inputs=", ".join(f"`{k}`"forkincls.input_fields)
outputs=", ".join(f"`{k}`"forkincls.output_fields)
cls.__doc__=f"Given the fields {inputs}, produce the fields {outputs}."# 5. Validate: every field must have InputField or OutputFieldforname, fieldincls.model_fields.items():
if"__dspy_field_type"notin (field.json_schema_extraor {}):
raiseTypeError(f"Field '{name}' must use InputField or OutputField")
# 6. Auto-generate prefix and desc for fields that don't have themforname, fieldincls.model_fields.items():
extra=field.json_schema_extraif"prefix"notinextra:
extra["prefix"] =infer_prefix(name) # snake_case -> "Title Case:"if"desc"notinextra:
extra["desc"] =f"${{{name}}}"# template placeholder
3.3 infer_prefix() -- Name to Prompt Prefix
Converts field names to human-readable prefixes:
"question" -> "Question:"
"some_attribute_name" -> "Some Attribute Name:"
"HTMLParser" -> "HTML Parser:"
Uses regex to split on underscores and camelCase boundaries, then title-cases and joins.
4. Two Ways to Define Signatures
Class-Based (Full Control)
classQA(dspy.Signature):
"""Answer questions with short factoid answers."""question: str=dspy.InputField()
answer: str=dspy.OutputField(desc="often between 1 and 5 words")
Here QA is a class. QA.__doc__ becomes the instructions. Fields are declared as class attributes with type annotations and InputField/OutputField defaults.
When SignatureMeta.__call__ sees a string, it routes to make_signature().
The String Parser
The parser is clever -- it uses Python's AST module:
def_parse_field_string(field_string: str, names=None):
# Wraps the field string as function parameters and parses with astargs=ast.parse(f"def f({field_string}): pass").body[0].args.args
This means field strings follow Python function parameter syntax: question: str, context: list[int] is valid because it would be valid as def f(question: str, context: list[int]): pass.
Type resolution happens in _parse_type_node(), which recursively walks the AST:
Custom: looked up via a names dict or by walking the Python call stack
Custom type auto-detection (_detect_custom_types_from_caller): When you write Signature("input: MyType -> output"), the metaclass walks up the call stack (up to 100 frames) looking in f_locals and f_globals for MyType. This is fragile but convenient. The reliable alternative is passing custom_types={"MyType": MyType}.
make_signature() -- The Factory
defmake_signature(signature, instructions=None, signature_name="StringSignature"):
""" Accepts either: - A string: "question -> answer" (parsed into fields) - A dict: {"question": InputField(), "answer": OutputField()} (used directly) Creates a new Signature class via pydantic.create_model(). """ifisinstance(signature, str):
fields=_parse_signature(signature)
else:
fields=signature# dict of {name: (type, FieldInfo)}# pydantic.create_model creates a new BaseModel subclass dynamicallymodel=pydantic.create_model(
signature_name,
__base__=Signature,
__doc__=instructions,
**fields,
)
returnmodel
5. Signature Properties (Class-Level)
These are properties on the metaclass, meaning they're accessed on the class itself (not instances):
@propertydefinstructions(cls) ->str:
"""The cleaned docstring. This is the task description shown to the LM."""returncls.__doc__@propertydefinput_fields(cls) ->dict[str, FieldInfo]:
"""Fields where __dspy_field_type == "input", in declaration order"""return {k: vfork, vincls.model_fields.items()
ifv.json_schema_extra["__dspy_field_type"] =="input"}
@propertydefoutput_fields(cls) ->dict[str, FieldInfo]:
"""Fields where __dspy_field_type == "output", in declaration order"""return {k: vfork, vincls.model_fields.items()
ifv.json_schema_extra["__dspy_field_type"] =="output"}
@propertydeffields(cls) ->dict[str, FieldInfo]:
"""All fields: {**input_fields, **output_fields}"""return {**cls.input_fields, **cls.output_fields}
@propertydefsignature(cls) ->str:
"""String representation: "input1, input2 -> output1, output2" """inputs=", ".join(cls.input_fields.keys())
outputs=", ".join(cls.output_fields.keys())
returnf"{inputs} -> {outputs}"
6. Signature Manipulation
All manipulation methods return new Signature classes. The original is never mutated. This is the immutable pattern.
definsert(cls, index, name, field, type_=None):
""" Splits fields into input_fields and output_fields lists. Determines which list based on __dspy_field_type. Inserts at the given index. Recombines and creates a new Signature. """input_fields=list(cls.input_fields.items())
output_fields=list(cls.output_fields.items())
lst=input_fieldsiffield.json_schema_extra["__dspy_field_type"] =="input"elseoutput_fieldslst.insert(index, (name, (type_orstr, field)))
new_fields=dict(input_fields+output_fields)
returnSignature(new_fields, cls.instructions)
delete(name)
defdelete(cls, name):
"""Removes the named field. Returns new Signature."""fields_copy=dict(cls.fields)
fields_copy.pop(name, None)
returnSignature(fields_copy, cls.instructions)
7. How Modules Modify Signatures
This is the core of the "augmentation pattern." Each module type manipulates the signature differently:
ChainOfThought -- Prepend Reasoning
extended_signature=signature.prepend(
name="reasoning",
field=dspy.OutputField(
prefix="Reasoning: Let's think step by step in order to",
desc="${reasoning}"
),
type_=str
)
"question -> answer" becomes "question -> reasoning, answer". The LM is forced to produce reasoning before the answer.
signature=signature.append("hint_", InputField(desc="A hint from an earlier run"))
Done inside the adapter wrapper at call time, not at construction time. This is unique -- most modules modify signatures at __init__.
8. Signature Serialization
dump_state() / load_state(state)
defdump_state(cls):
"""Dumps instructions + per-field prefix and description."""return {
"instructions": cls.instructions,
"fields": {
name: {
"prefix": field.json_schema_extra.get("prefix"),
"desc": field.json_schema_extra.get("desc"),
}
forname, fieldincls.fields.items()
}
}
defload_state(cls, state):
"""Creates a new Signature from stored state. Updates instructions and field prefix/desc from the saved state."""new_sig=cls.with_instructions(state["instructions"])
forname, field_stateinstate.get("fields", {}).items():
ifnameinnew_sig.fields:
new_sig=new_sig.with_updated_fields(name, **field_state)
returnnew_sig
This is what Predict.dump_state() calls under state["signature"]. It preserves the optimized instructions and field metadata while the field types and structure come from the code.
9. Pydantic Integration
How Types Map to Prompts
The adapter uses translate_field_type() to generate type hints for the LM:
Python Type
Prompt Hint
str
(no hint)
bool
"must be True or False"
int / float
"must be a single int/float value"
Enum
"must be one of: val1; val2; val3"
Literal["a", "b"]
"must exactly match one of: a; b"
Complex types
"must adhere to the JSON schema: {...}" (Pydantic JSON schema)
How Parsing Works
Parsing happens in parse_value() (dspy/adapters/utils.py):
str annotation -> return raw string
Enum -> find matching member by value or name
Literal -> validate against allowed values
bool/int/float -> type cast
Complex types -> json_repair.loads() then pydantic.TypeAdapter(annotation).validate_python()
DSPy Type subclasses -> custom parsing
10. The Signature as Contract
A Signature encodes:
Aspect
How
What inputs are needed
input_fields dict
What outputs are produced
output_fields dict
How to describe the task
instructions (docstring)
How to present each field
prefix and desc per field
What types are expected
Python type annotations per field
What constraints apply
Pydantic constraints -> constraints string
Field ordering
Dict insertion order (inputs first, then outputs)
The signature flows through the entire system:
Module holds it on self.signature
Adapter.format() reads it to build the prompt
Adapter.parse() reads it to know what to extract
Optimizers modify instructions and field prefix/desc
Predict is the only leaf node in the DSPy module tree. It is the only class that inherits from both Module (callable, composable) and Parameter (discoverable by optimizers). Every higher-level module (ChainOfThought, ReAct, etc.) ultimately delegates to one or more Predict instances.
A Predict takes a Signature, formats it into a prompt via an adapter, calls an LM, parses the response back into typed outputs, and returns a Prediction.
1. Construction
classPredict(Module, Parameter):
def__init__(self, signature: str|type[Signature], callbacks=None, **config):
super().__init__(callbacks=callbacks)
self.stage=random.randbytes(8).hex() # Unique ID for tracingself.signature=ensure_signature(signature) # Parse string -> Signature classself.config=config# Default LM kwargs (temperature, n, etc.)self.reset()
defreset(self):
"""Clears all learned/optimizable state."""self.lm=None# Per-predictor LM override (None = use settings.lm)self.traces= [] # Execution traces (for optimization)self.train= [] # Training examplesself.demos= [] # Few-shot examples (THE primary optimizable state)
Key Attributes
Attribute
Type
Purpose
Optimizable?
signature
type[Signature]
The typed I/O contract
Yes (instructions, field prefixes)
demos
list[Example]
Few-shot examples prepended to prompt
Yes (primary optimization lever)
lm
LM | None
Per-predictor LM override
Yes (BootstrapFinetune replaces this)
config
dict
Default LM kwargs (temp, n, etc.)
No (set at construction)
stage
str
Random hex ID for tracing
No
traces
list
Execution traces for optimization
Bookkeeping
train
list
Training examples
Bookkeeping
ensure_signature()
Converts various inputs to a Signature class:
String "question -> answer" -> parse into a Signature class
def_forward_postprocess(self, completions, signature, **kwargs):
# 1. Build Prediction from completionspred=Prediction.from_completions(completions, signature=signature)
# 2. Append to trace if tracing is enabledifkwargs.pop("_trace", True) andsettings.traceisnotNone:
trace=settings.traceiflen(trace) >=settings.max_trace_size:
trace.pop(0) # LRU evictiontrace.append((self, {**kwargs}, pred))
# Tuple: (predictor_instance, input_kwargs_dict, prediction_output)returnpred
The trace tuple(self, inputs, prediction) is how optimizers connect outputs back to specific Predict instances. BootstrapFewShot reads these traces to create demos.
3. Predict State Management
dump_state() -- Serialization
defdump_state(self, json_mode=True):
state_keys= ["traces", "train"]
state= {k: getattr(self, k) forkinstate_keys}
# Serialize demos (the main optimizable state)state["demos"] = []
fordemoinself.demos:
demo=demo.copy()
forfieldindemo:
demo[field] =serialize_object(demo[field]) # Pydantic models -> dictsifisinstance(demo, dict) ornotjson_mode:
state["demos"].append(demo)
else:
state["demos"].append(demo.toDict())
# Signature state (instructions + field prefixes/descriptions)state["signature"] =self.signature.dump_state()
# LM state (model config) or Nonestate["lm"] =self.lm.dump_state() ifself.lmelseNonereturnstate
lm(messages=messages, **kwargs): Actually call the LM.
_call_postprocess(): Parse each completion via parse(signature, text).
The result is a list of dicts, one per completion, each containing the output field values.
Then Prediction.from_completions() wraps this into a Prediction object.
5. Prediction and Example
Example (dspy/primitives/example.py)
Dict-like container with input/label separation:
classExample:
def__init__(self, **kwargs):
self._store=kwargs# The actual dataself._input_keys=set() # Which keys are inputsself._demos= [] # Attached demos (rarely used)defwith_inputs(self, *keys):
"""Mark which fields are inputs. Returns self (mutates)."""self._input_keys=set(keys)
returnselfdefinputs(self):
"""Returns Example with only input keys."""return {k: vfork, vinself._store.items() ifkinself._input_keys}
deflabels(self):
"""Returns Example with only non-input keys."""return {k: vfork, vinself._store.items() ifknotinself._input_keys}
Training data and demos are both Examples. The .with_inputs() call marks the boundary between what gets passed as input and what's a label.
Prediction (dspy/primitives/prediction.py)
Subclass of Example, returned by all modules:
classPrediction(Example):
def__init__(self, **kwargs):
super().__init__(**kwargs)
self._completions=None# All completions (not just the first)self._lm_usage=None# Token usage tracking@classmethoddeffrom_completions(cls, list_or_dict, signature=None):
""" Wraps completions into a Prediction. - Stores all completions as a Completions object - pred._store = {k: v[0] for k, v in completions.items()} (first completion is the default) """obj=cls()
obj._completions=Completions(list_or_dict, signature=signature)
# Set primary values to first completionobj._store= {k: v[0] fork, vinobj._completions.items()}
returnobj
Attribute access (pred.answer) returns the first completion's value. pred.completions.answer returns all completions for that field.
6. The Complete Flow
Putting it all together for a single predict(question="What is 2+2?") call:
1. Predict.__call__(question="What is 2+2?")
-> Validates no positional args
-> Module.__call__(**kwargs)
-> @with_callbacks: on_module_start
-> Push self to caller_modules stack
-> Predict.forward(question="What is 2+2?")
2. _forward_preprocess(question="What is 2+2?")
-> signature = self.signature (e.g., "question -> answer")
-> demos = self.demos (e.g., 3 few-shot examples)
-> config = {**self.config} (e.g., {temperature: 0})
-> lm = self.lm or settings.lm
-> kwargs = {question: "What is 2+2?"}
-> return (lm, config, signature, demos, kwargs)
3. adapter = settings.adapter or ChatAdapter()
4. completions = adapter(lm, lm_kwargs=config, signature=signature,
demos=demos, inputs=kwargs)
Inside adapter.__call__:
a. _call_preprocess: check for tools/native types, may modify signature
b. format(signature, demos, inputs):
- System message: field descriptions + format structure + instructions
- Demo messages: few-shot examples as user/assistant pairs
- User message: current inputs + output format reminder
c. lm(messages=messages, **lm_kwargs):
- litellm call to the actual LM
- Returns list of completion strings
d. _call_postprocess: for each completion:
- parse(signature, text): extract output field values
- Returns list of dicts: [{answer: "4"}, ...]
5. _forward_postprocess(completions, signature, question="What is 2+2?")
-> Prediction.from_completions([{answer: "4"}])
-> Append (self, {question: "What is 2+2?"}, prediction) to settings.trace
-> Return prediction
6. Module.__call__ returns
-> @with_callbacks: on_module_end
-> Return Prediction(answer="4")
Augmentation Patterns: How Modules Build on Predict
The Core Idea
Every DSPy module that does anything interesting is orchestration on top of Predict. The module itself is not a parameter -- it's a container. The actual "learning" (demos, instructions) lives entirely inside the Predict instances it holds.
There are exactly four augmentation patterns in DSPy:
Pattern
Mechanism
Modules
Signature Extension
Modify the signature at __init__ time, delegate to one Predict
ChainOfThought, MultiChainComparison
Multi-Signature Orchestration
Multiple Predicts with different signatures, orchestrated in a loop
ReAct, ProgramOfThought
Module Wrapping
Wrap an arbitrary Module, run it multiple times, select best output
BestOfN, Refine
Aggregation
Take multiple completions and synthesize/vote
MultiChainComparison, majority()
Pattern 1: Signature Extension
ChainOfThought -- The Canonical Example
File: dspy/predict/chain_of_thought.py
classChainOfThought(Module):
def__init__(self, signature, rationale_field=None, rationale_field_type=str, **config):
super().__init__()
signature=ensure_signature(signature)
# Default rationale fieldprefix="Reasoning: Let's think step by step in order to"desc="${reasoning}"rationale_field_type=rationale_field.annotationifrationale_fieldelserationale_field_typerationale_field=rationale_fieldifrationale_fieldelsedspy.OutputField(prefix=prefix, desc=desc)
# THE AUGMENTATION: prepend a "reasoning" output fieldextended_signature=signature.prepend(
name="reasoning",
field=rationale_field,
type_=rationale_field_type
)
# Single Predict with the extended signatureself.predict=dspy.Predict(extended_signature, **config)
defforward(self, **kwargs):
returnself.predict(**kwargs)
forward() is a pure passthrough to the single Predict
What optimizers see: One Predict at path "predict". They can:
Add demos to self.predict.demos
Rewrite self.predict.signature.instructions
Rewrite the reasoning field's prefix (e.g., change "Let's think step by step" to something better)
The Reasoning type trick: If rationale_field_type is the Reasoning custom type (instead of str), the adapter detects it at call time. If the LM supports native reasoning (o1, o3), the adapter removes the reasoning field from the signature and enables the model's built-in chain-of-thought via reasoning_effort in lm_kwargs. The LM does its own reasoning internally, and the adapter extracts reasoning_content from the response. For non-reasoning models, it falls back to text-based reasoning.
MultiChainComparison -- Aggregation via Signature Extension
File: dspy/predict/multi_chain_comparison.py
classMultiChainComparison(Module):
def__init__(self, signature, M=3, temperature=0.7, **config):
super().__init__()
self.M=Msignature=ensure_signature(signature)
*_, self.last_key=signature.output_fields.keys() # The final output field name# Append M input fields for "student attempts"foridxinrange(M):
signature=signature.append(
f"reasoning_attempt_{idx+1}",
InputField(
prefix=f"Student Attempt #{idx+1}:",
desc="${reasoning attempt}"
),
)
# Prepend a rationale output fieldsignature=signature.prepend(
"rationale",
OutputField(
prefix="Accurate Reasoning: Thank you everyone. Let's now holistically",
desc="${corrected reasoning}",
),
)
self.predict=Predict(signature, temperature=temperature, **config)
The forward method is unique -- it takes completions as input:
defforward(self, completions, **kwargs):
attempts= []
forcincompletions:
rationale=c.get("rationale", c.get("reasoning")).strip().split("\n")[0].strip()
answer=str(c[self.last_key]).strip().split("\n")[0].strip()
attempts.append(
f"<<I'm trying to {rationale} I'm not sure but my prediction is {answer}>>"
)
kwargs= {
**{f"reasoning_attempt_{idx+1}": attemptforidx, attemptinenumerate(attempts)},
**kwargs,
}
returnself.predict(**kwargs)
The pattern: run ChainOfThought M times, feed all M attempts into MultiChainComparison, get a synthesized answer. The signature extension adds the M input slots and a synthesis rationale.
Pattern 2: Multi-Signature Orchestration
ReAct -- Tool-Using Agent Loop
File: dspy/predict/react.py
classReAct(Module):
def__init__(self, signature, tools, max_iters=20):
super().__init__()
self.signature=signature=ensure_signature(signature)
self.max_iters=max_iters# Convert callables to Tool objectstools= [tifisinstance(t, Tool) elseTool(t) fortintools]
tools= {tool.name: toolfortoolintools}
# Add a "finish" tool that signals completion# (returns a dict with the original output field values)tools["finish"] =Tool(
func=lambda**kwargs: "Completed.",
name="finish",
desc="Signal task completion.",
args={name: ... fornameinsignature.output_fields},
)
self.tools=tools
Two separate Predict instances with different signatures:
# The action-selection signatureinstr= [
signature.instructions,
"You will be given `trajectory` as context.",
f"Tools: {tool_descriptions}",
"Finish with the `finish` tool when done.",
]
react_signature= (
dspy.Signature({**signature.input_fields}, "\n".join(instr))
.append("trajectory", dspy.InputField(), type_=str)
.append("next_thought", dspy.OutputField(), type_=str)
.append("next_tool_name", dspy.OutputField(), type_=Literal[tuple(tools.keys())])
.append("next_tool_args", dspy.OutputField(), type_=dict[str, Any])
)
# The extraction signature (uses ChainOfThought)fallback_signature=dspy.Signature(
{**signature.input_fields, **signature.output_fields},
signature.instructions,
).append("trajectory", dspy.InputField(), type_=str)
self.react=dspy.Predict(react_signature)
self.extract=dspy.ChainOfThought(fallback_signature)
The agent loop:
defforward(self, **input_args):
trajectory= {}
foridxinrange(self.max_iters):
# Ask the LM what to do nextpred=self._call_with_potential_trajectory_truncation(
self.react, trajectory, **input_args
)
# Record the action in trajectorytrajectory[f"thought_{idx}"] =pred.next_thoughttrajectory[f"tool_name_{idx}"] =pred.next_tool_nametrajectory[f"tool_args_{idx}"] =pred.next_tool_args# Actually execute the tooltry:
trajectory[f"observation_{idx}"] =self.tools[pred.next_tool_name](
**pred.next_tool_args
)
exceptExceptionaserr:
trajectory[f"observation_{idx}"] =f"Execution error: {_fmt_exc(err)}"# Break if finish tool was selectedifpred.next_tool_name=="finish":
break# Extract final answer from the full trajectoryextract=self._call_with_potential_trajectory_truncation(
self.extract, trajectory, **input_args
)
returndspy.Prediction(trajectory=trajectory, **extract)
Context window handling: _call_with_potential_trajectory_truncation retries up to 3 times on ContextWindowExceededError, each time truncating the oldest 4 trajectory entries (one tool call = thought + name + args + observation).
Parameters exposed to optimizers: Two Predict instances:
self.react -- the action-selection predictor
self.extract.predict -- the ChainOfThought's internal Predict for extraction
ProgramOfThought -- Code Generation + Execution
File: dspy/predict/program_of_thought.py
classProgramOfThought(Module):
def__init__(self, signature, max_iters=3, interpreter=None):
super().__init__()
self.signature=signature=ensure_signature(signature)
self.input_fields=signature.input_fieldsself.output_fields=signature.output_fields# THREE separate ChainOfThought modules, each with a custom signature:# 1. Generate code from inputsself.code_generate=dspy.ChainOfThought(
dspy.Signature(
self._generate_signature("generate").fields,
self._generate_instruction("generate")
),
)
# 2. Regenerate code given previous code + errorself.code_regenerate=dspy.ChainOfThought(
dspy.Signature(
self._generate_signature("regenerate").fields,
self._generate_instruction("regenerate")
),
)
# 3. Interpret code output into final answerself.generate_output=dspy.ChainOfThought(
dspy.Signature(
self._generate_signature("answer").fields,
self._generate_instruction("answer")
),
)
self.interpreter=interpreterorPythonInterpreter()
"answer": original inputs + final_generated_code: str + code_output: str -> original outputs
Parameters exposed to optimizers: Three ChainOfThought modules, each with an internal Predict:
self.code_generate.predict
self.code_regenerate.predict
self.generate_output.predict
Pattern 3: Module Wrapping
BestOfN -- Rejection Sampling
File: dspy/predict/best_of_n.py
classBestOfN(Module):
def__init__(self, module, N, reward_fn, threshold, fail_count=None):
self.module=moduleself.N=Nself.threshold=thresholdself.fail_count=fail_countorN# IMPORTANT: wrapped in lambda to prevent named_parameters() from# discovering it (a raw function assigned to self would be walked)self.reward_fn=lambda*args: reward_fn(*args)
classWrapperAdapter(adapter.__class__):
def__call__(self, lm, lm_kwargs, signature, demos, inputs):
# Dynamically add a hint field to the signatureinputs["hint_"] =advice.get(signature2name[signature], "N/A")
signature=signature.append(
"hint_",
InputField(desc="A hint to the module from an earlier run")
)
returnadapter(lm, lm_kwargs, signature, demos, inputs)
This is the modern replacement for Assert/Suggest. Instead of backtracking and mutating signatures permanently, Refine:
Runs the module
If the metric fails, asks an LM for advice
Injects that advice as a temporary "hint" field on the next attempt
The signature modification happens at call time via the adapter wrapper, not at construction time
Pattern 4: Aggregation
majority() -- Voting
Not a module, just a function:
defmajority(prediction_or_completions, normalize=...):
"""Returns the most common value across completions."""
MultiChainComparison (covered above)
Takes M completions and synthesizes them. This is aggregation via signature extension.
Deprecated / Removed Modules
Retry -- Removed
The entire file (dspy/predict/retry.py) is commented out. Not exported. Replaced by Refine and BestOfN.
Assert / Suggest -- Removed in DSPy 2.6
These were inline constraints that triggered backtracking:
# OLD (removed):dspy.Assert(len(answer) <100, "Answer too long")
When the constraint failed, it would dynamically modify the signature by adding past_{output_field} InputFields and a feedback InputField. On persistent failure, Assert raised an error; Suggest logged and continued.
Replaced by Refine which does the same thing more cleanly.
ChainOfThoughtWithHint -- Removed
Absorbed into Refine's hint injection mechanism.
Summary: What Each Module Exposes to Optimizers
Module
# Predicts
Paths
What's Optimizable
Predict
1
self
demos, signature.instructions, field prefixes
ChainOfThought
1
predict
demos, instructions, reasoning prefix
MultiChainComparison
1
predict
demos, instructions, rationale prefix
ReAct
2
react, extract.predict
demos and instructions for both action selection and extraction
For Reasoning: checks if LM supports native reasoning (via litellm.supports_reasoning()). If yes, sets reasoning_effort in lm_kwargs and deletes the reasoning field from the signature. The model uses its built-in chain-of-thought.
Returns the modified signature (with native-handled fields removed)
Step 3: _call_postprocess()
For each LM output:
If the output has text: call self.parse(processed_signature, text) -> dict of field values
Set missing fields (ones in original but not processed signature) to None
If tool_calls present: parse into ToolCalls.from_dict_list()
For native response types: call field.annotation.parse_lm_response(output) (e.g., extract reasoning_content from the response dict)
format_demos(signature, demos) -- Sorts demos into complete and incomplete:
defformat_demos(self, signature, demos):
messages= []
# Separate complete (all fields) from incomplete (some missing)complete_demos= [dfordindemosifallfieldspresent]
incomplete_demos= [dfordindemosifhas_inputANDhas_outputbutnotall]
# Incomplete demos come FIRST with a disclaimerfordemoinincomplete_demos:
# User message with "This is an example of the task, though some input# or output fields are not supplied."# Missing fields show: "Not supplied for this particular example."# Complete demos afterfordemoincomplete_demos:
# User/assistant message pair with all fields filled
2. ChatAdapter
File: dspy/adapters/chat_adapter.py
The default adapter. Uses [[ ## field_name ## ]] delimiters to separate fields.
Your input fields are:
1. `question` (str): The question to answer
2. `context` (list[str]): Relevant passages
Your output fields are:
1. `answer` (str): The answer, often between 1 and 5 words
format_field_structure(signature)
Shows the expected format using [[ ## field_name ## ]] markers:
All interactions will be structured in the following way, with the appropriate values filled in.
[[ ## question ## ]]
{question}
[[ ## context ## ]]
{context}
[[ ## answer ## ]]
{answer} # note: the value you produce must be a single str value
[[ ## completed ## ]]
The type hints come from translate_field_type():
Python Type
Prompt Hint
str
(no hint)
bool
"must be True or False"
int / float
"must be a single int/float value"
Enum
"must be one of: val1; val2; val3"
Literal["a", "b"]
"must exactly match (no extra characters) one of: a; b"
Complex types
"must adhere to the JSON schema: {...}" (Pydantic JSON schema)
format_task_description(signature)
In adhering to this structure, your objective is:
Answer questions with short factoid answers.
[[ ## question ## ]]
What is the capital of France?
[[ ## context ## ]]
[1] <<France is a country in Western Europe. Its capital is Paris.>>
Respond with the corresponding output fields, starting with the field `[[ ## answer ## ]]`,
and then ending with the marker for `[[ ## completed ## ]]`.
The last line (output requirements) is only added when main_request=True (not for demos).
Lists of strings: numbered format [1] <<text>>, [2] <<text>>
Dicts/lists of non-strings: json.dumps(jsonable_value)
Primitives: str(value)
Single items with delimiters: <<value>> or <<<multi\nline>>> for long values
parse(signature, completion)
defparse(self, signature, completion):
# 1. Split on [[ ## field_name ## ]] headerssections=re.split(r"\[\[ ## (\w+) ## \]\]", completion)
# 2. Group content under each headerfields= {}
forheader, contentinpaired_sections:
ifheaderinsignature.output_fields:
fields[header] =content.strip()
# 3. Parse each field value to its annotated typeforname, raw_valueinfields.items():
annotation=signature.output_fields[name].annotationfields[name] =parse_value(raw_value, annotation)
# 4. Validate all output fields are presentifnotall(nameinfieldsfornameinsignature.output_fields):
raiseAdapterParseError(...)
returnfields
format_field_structure(signature) -- Different from ChatAdapter
User inputs still use [[ ## field_name ## ]] markers, but outputs are described as JSON:
Inputs will have the following structure:
[[ ## question ## ]]
{question}
Outputs will be a JSON object with the following fields.
{
"answer": "{answer}" // note: must adhere to JSON schema: ...
}
parse(signature, completion) -- JSON parsing
defparse(self, signature, completion):
# 1. Parse with json_repair (handles malformed JSON)result=json_repair.loads(completion)
# 2. If not a dict, try regex extraction of JSON objectifnotisinstance(result, dict):
match=regex.search(r"\{(?:[^{}]|(?R))*\}", completion)
result=json_repair.loads(match.group())
# 3. Filter to known output fieldsresult= {k: vfork, vinresult.items() ifkinsignature.output_fields}
# 4. Parse each value to its annotated typeforname, valueinresult.items():
result[name] =parse_value(value, signature.output_fields[name].annotation)
# 5. Validate all fields presentifnotall(nameinresultfornameinsignature.output_fields):
raiseAdapterParseError(...)
returnresult
Structured Outputs Model Generation
_get_structured_outputs_response_format(signature) builds a Pydantic model from output fields with OpenAI's requirements:
extra="forbid" (no additional properties)
Recursive enforce_required() ensures all nested objects have required and additionalProperties: false
4. Other Adapters
XMLAdapter
File: dspy/adapters/xml_adapter.py
Uses <field_name>...</field_name> XML tags instead of [[ ## ]] delimiters. Otherwise similar to ChatAdapter.
TwoStepAdapter
File: dspy/adapters/two_step_adapter.py
Uses two LM calls:
First call: natural language prompt, get a free-form response
Second call: use ChatAdapter to extract structured fields from the free-form response
Useful for models that struggle with strict formatting.
5. Complete Message Assembly Example
For a ChainOfThought("question -> answer") with 2 demos and the input "What is 2+2?":
System Message
Your input fields are:
1. `question` (str)
Your output fields are:
1. `reasoning` (str): ${reasoning}
2. `answer` (str)
All interactions will be structured in the following way, with the appropriate values filled in.
[[ ## question ## ]]
{question}
[[ ## reasoning ## ]]
{reasoning}
[[ ## answer ## ]]
{answer}
[[ ## completed ## ]]
In adhering to this structure, your objective is:
Given the fields `question`, produce the fields `reasoning`, `answer`.
Demo 1 (User)
[[ ## question ## ]]
What is the capital of France?
Demo 1 (Assistant)
[[ ## reasoning ## ]]
The question asks about the capital of France. France is a country in Europe, and its capital city is Paris.
[[ ## answer ## ]]
Paris
[[ ## completed ## ]]
Demo 2 (User + Assistant)
(Same pattern)
Current Input (User)
[[ ## question ## ]]
What is 2+2?
Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`,
and then ending with the marker for `[[ ## completed ## ]]`.
LM Response (Assistant)
[[ ## reasoning ## ]]
The question asks for the sum of 2 and 2. Basic arithmetic: 2 + 2 = 4.
[[ ## answer ## ]]
4
[[ ## completed ## ]]
Parsed Result
{"reasoning": "The question asks for the sum of 2 and 2. Basic arithmetic: 2 + 2 = 4.",
"answer": "4"}
6. Settings and Adapter Configuration
Global Configuration
dspy.configure(
lm=dspy.LM("openai/gpt-4"),
adapter=dspy.ChatAdapter(), # Default if not set
)
# In _forward_preprocess:adapter=settings.adapterorChatAdapter() # Global or defaultlm=kwargs.pop("lm", self.lm) orsettings.lm# Per-call > per-predict > global
Serialized with custom markers: <<CUSTOM-TYPE-START-IDENTIFIER>>json<<CUSTOM-TYPE-END-IDENTIFIER>>
split_message_content_for_custom_types() finds these markers and splits the user message into multimodal content blocks (text + image_url parts), matching OpenAI's multimodal message format
Reasoning (dspy/adapters/types/reasoning.py)
String-like custom type
adapt_to_native_lm_feature(): If LM supports native reasoning, sets reasoning_effort in lm_kwargs and removes the reasoning field from signature
parse_lm_response(): Extracts reasoning_content from the response dict
Falls back to text-based reasoning for non-reasoning models
Tool / ToolCalls (dspy/adapters/types/tool.py)
Handled in _call_preprocess: tools converted to litellm function calling format
Tool and ToolCalls fields removed from signature before formatting
In _call_postprocess: tool calls from LM response parsed back into ToolCalls objects
The implicit contract between an optimizer and a module:
The module has Predict instances as leaf parameters. Discovered via named_parameters() / named_predictors(). A module with no Predict instances has nothing to optimize.
Each Predict has a signature with mutable .instructions and field prefix/desc.
Each Predict has a demos list (initially []). The primary optimization lever.
Each Predict has an optional lm attribute. BootstrapFinetune replaces this with a finetuned model.
Running the module records traces to settings.trace. Optimizers read traces to attribute outputs to specific predictors.
Student and teacher must be structurally equivalent. Same number of predictors, same names, same signatures.
deepcopy() and reset_copy() produce valid independent copies. Optimizers always copy before modifying.
dump_state() / load_state() round-trip the optimized state.
1. Module Discovery
named_parameters() -- What Optimizers See
# For a program like:classRAG(dspy.Module):
def__init__(self):
self.retrieve=dspy.Predict("question -> passages")
self.answer=dspy.ChainOfThought("question, passages -> answer")
# named_parameters() returns:
[
("retrieve", <Predict>), # self.retrieve IS a Parameter
("answer.predict", <Predict>), # ChainOfThought holds self.predict
]
Almost every optimizer uses this. Since Predict is currently the only Parameter subclass, named_parameters() and named_predictors() return the same things. But the filter makes the intent explicit.
predictor2name / name2predictor Mappings
Optimizers (especially BootstrapFewShot) build bidirectional maps to connect traces back to predictors:
# In BootstrapFewShot._prepare_predictor_mappings():self.name2predictor= {}
self.predictor2name= {}
forname, predictorinself.student.named_predictors():
self.name2predictor[name] =predictorself.predictor2name[id(predictor)] =name# Same for teacher
id(predictor) is the key -- when a trace records (predictor_instance, inputs, prediction), the optimizer looks up predictor2name[id(predictor_instance)] to find which named predictor produced that output.
2. What Optimizers Modify
There are exactly four things optimizers touch on Predict instances:
Optimizers capture traces by wrapping execution in a trace context:
# BootstrapFewShot:withdspy.context(trace=[]):
prediction=teacher(**example.inputs())
trace=dspy.settings.trace# trace is now [(pred1, inputs1, output1), (pred2, inputs2, output2), ...]
Traces connect predictors to their I/O: The predictor_instance in the tuple lets optimizers map back to named predictors via predictor2name[id(predictor)].
Metrics can use traces: Metric functions can accept an optional trace parameter:
defmy_metric(example, prediction, trace=None):
# Can inspect intermediate steps, not just final output
5. Key Optimizers
BootstrapFewShot (dspy/teleprompt/bootstrap.py)
The foundational optimizer. Populates demos on Predict instances by running a teacher and capturing successful traces.
Step 1: compile(student, *, teacher, trainset)
defcompile(self, student, *, teacher=None, trainset):
self.student=student.reset_copy() # Deep copy + clear all demosself.teacher= (teacherorstudent).deepcopy()
self._prepare_predictor_mappings()
self._bootstrap()
self._train()
self.student._compiled=Truereturnself.student
Step 2: _prepare_predictor_mappings()
Asserts student and teacher have identical structure (same number of predictors, same names)
Builds name2predictor and predictor2name for both
Step 3: _bootstrap() -- Generate Demo Candidates
For each training example:
forexampleintrainset:
withdspy.context(trace=[]):
prediction=self.teacher(**example.inputs())
trace=dspy.settings.trace# Check if the output passes the metricifself.metric(example, prediction):
# Extract demos from the traceforpredictor, inputs, outputintrace:
name=self.predictor2name[id(predictor)]
demo=dspy.Example(augmented=True, **inputs, **output)
self.name2traces[name].append(demo)
The key mechanism: run the teacher, capture the trace, check the metric, and if it passes, create Example objects from each predictor's input/output pair.
Runs BootstrapFewShot multiple times with different configurations and picks the best:
# Generates candidate programs with different strategies:# Seed -3: Zero-shot (reset_copy, no demos)# Seed -2: Labels only (LabeledFewShot)# Seed -1: Unshuffled bootstrap# Seeds 0+: Shuffled bootstrap with random demo count# Evaluates each on validation set# Returns the best-scoring program# Attaches all candidates as best_program.candidate_programs
MIPROv2 (dspy/teleprompt/mipro_optimizer_v2.py)
The most sophisticated optimizer. Jointly optimizes instructions AND demos using Bayesian optimization (Optuna).
Uses GroundedProposer -- an LM-based instruction generator
Can be program-aware (reads source code), data-aware (summarizes training data), tip-aware (includes prompting tips), fewshot-aware (includes example demos)
Produces instruction_candidates[i] -- a list of instruction strings for each predictor i
Step 2: _prepare_finetune_data() -- Convert traces to training format:
fortrace_entryintrace_data:
forpred, inputs, outputsintrace_entry.trace:
# Use the adapter to format as training datatraining_example=adapter.format_finetune_data(
signature, demos, inputs, outputs
)
# This produces chat-format messages suitable for finetuning
Step 3: finetune_lms() -- Group predictors by LM, finetune:
# If multitask=True: all predictors sharing an LM get one combined finetune jobfinetuned_lm=lm.finetune(train_data, ...)
The architecture (which modules exist, how they're connected) comes from code. The optimized state (demos, instructions, field metadata) comes from the saved file.
Strip away the Python dynamism and DSPy's module system is:
A tree of composable nodes where leaf nodes (Predict) hold optimizable state
A typed I/O contract (Signature) that describes what goes in and what comes out
A formatting/parsing layer (Adapter) that converts typed contracts to LM prompts and back
A tree traversal that lets optimizers discover and modify leaf nodes
A tracing mechanism that records execution for optimizer feedback
That's it. Everything else is orchestration (how modules compose Predicts) and strategy (how optimizers search the space).
The Hard Problems
1. Dynamic Signature Manipulation
In Python, signatures are classes created at runtime via metaclass magic. Modules like ChainOfThought do signature.prepend("reasoning", OutputField(...)) which creates a new type at runtime.
In Rust: Signatures are data, not types. Model them as:
structSignature{name:String,instructions:String,fields:IndexMap<String,Field>,// Ordered map (insertion order matters)}structField{direction:FieldDirection,// Input | Outputtype_annotation:TypeAnnotation,prefix:String,desc:String,format:Option<Box<dynFn(&str) -> String>>,constraints:Option<String>,}enumFieldDirection{Input,Output,}enumTypeAnnotation{Str,Int,Float,Bool,List(Box<TypeAnnotation>),Dict(Box<TypeAnnotation>,Box<TypeAnnotation>),Optional(Box<TypeAnnotation>),Enum(Vec<String>),Literal(Vec<String>),Json(serde_json::Value),// For complex types, store JSON schema}
All manipulation methods (with_instructions, prepend, append, delete, with_updated_fields) return new Signature values. This maps cleanly to Rust's ownership model -- signatures are cheap to clone and manipulate.
2. The Parameter Tree Walk
Python does this by walking __dict__ and checking isinstance. Rust doesn't have runtime reflection.
A proc macro generates named_parameters() by inspecting fields marked with #[parameter].
Option C: Inventory/registry -- each module registers itself. More complex, probably overkill.
Recommendation: Start with Option A (explicit). It's simple, correct, and makes the tree structure obvious. Add a derive macro later if the boilerplate becomes painful.
3. The _compiled Freeze Flag
In Python, _compiled = True makes named_parameters() skip a sub-module. In Rust:
Simple approach: A boolean flag on every module, checked in named_parameters().
Type-state approach (more Rusty):
structCompiledModule<M:Module>{inner:M,// named_parameters() returns empty vec// Cannot be modified without explicitly un-compiling}impl<M:Module>ModuleforCompiledModule<M>{fnnamed_parameters(&self) -> Vec<(String,&dynParameter)>{vec![]// Frozen -- parameters are not exposed}fnforward(&self,inputs:HashMap<String,Value>) -> Result<Prediction>{self.inner.forward(inputs)}}
4. The Adapter System
Adapters are the most straightforward part to port. They're essentially:
Python uses a global thread-local list that Predicts append to. In Rust:
// Thread-local trace contextthread_local!{staticTRACE:RefCell<Option<Vec<TraceEntry>>> = RefCell::new(None);}structTraceEntry{predictor_id:PredictorId,// Not a reference -- an ID for lookupinputs:HashMap<String,Value>,prediction:Prediction,}// In Predict::forward:TRACE.with(|trace| {ifletSome(refmut trace) = *trace.borrow_mut(){
trace.push(TraceEntry{predictor_id:self.id, inputs, prediction });}});// In optimizer:let trace = with_trace(|| teacher.forward(example.inputs()));
Use IDs instead of references. Python uses id(predictor) (memory address); Rust should use a stable identifier (UUID, path string, or index).
6. Value Types and Parsing
DSPy uses Python's dynamic typing + Pydantic for validation. In Rust, you need a value type:
enumValue{Str(String),Int(i64),Float(f64),Bool(bool),List(Vec<Value>),Dict(HashMap<String,Value>),Null,Json(serde_json::Value),// For complex/unknown types}
Module trait with forward() and named_parameters()
Parameter trait extending Module
Predict implementing both
BaseModule trait for tree traversal, serialization
Phase 3: Adapter Layer
Adapter trait
ChatAdapter (formatting and parsing)
JsonAdapter
parse_value for type coercion
Phase 4: Composition Modules
ChainOfThought (signature extension pattern)
ReAct (multi-signature orchestration pattern)
BestOfN / Refine (module wrapping pattern)
Phase 5: Optimization
Tracing infrastructure
Evaluate
BootstrapFewShot
LabeledFewShot
More complex optimizers as needed
Design Decisions to Make Early
1. Static vs Dynamic Signatures
Python signatures carry Python types (Pydantic models, etc.). Rust signatures will need to decide:
Fully dynamic (TypeAnnotation enum + Value enum) -- flexible, similar to Python, but loses Rust's type safety
Partially typed (generics for common cases, Value for complex) -- more Rusty but more complex
Schema-driven (JSON Schema as the universal type description) -- pragmatic, works with any LM
Recommendation: Start fully dynamic. The type safety that matters here is at the LM boundary (parsing), not at compile time. You're dealing with strings from an LM no matter what.
2. Ownership of Demos and Signatures
In Python, optimizers freely mutate predictor.demos and predictor.signature. In Rust:
Mutable references: Optimizers take &mut references to the program
Interior mutability: Use RefCell<Vec<Example>> for demos
Clone + replace: Clone the whole program, modify the clone, return it (matches Python's reset_copy() pattern)
Recommendation: Clone + replace. It matches the Python pattern where optimizers always copy the student first, and it avoids fighting the borrow checker.
3. Async vs Sync
LM calls are inherently async (HTTP requests). The question is whether forward() should be async.
Recommendation: Make it async from the start. async fn forward(&self, ...) -> Result<Prediction>. Easier than retrofitting later.
4. Error Types
DSPy uses AdapterParseError, ContextWindowExceededError, and generic exceptions. Design a clean error enum: