Training agentic models that can effectively use tools remains one of the harder problems in applied ML. Models trained on purely synthetic data - where tool calls and their responses are both generated by an LLM - consistently underperform when deployed against real systems. They struggle with error recovery, mishandle state dependencies, and often exhibit what we call "time travel" errors: acting on information they haven't actually received yet.
This post introduces DeepFabric's execution-based tool tracing system, which replaces simulated tool outputs with real execution inside WebAssembly sandboxes. The result is training data grounded in actual system behavior, including the messy parts that make real-world tool use challenging.
Consider a typical synthetic data generation pipeline for tool-using agents. An LLM generates a user request, then generates an assistant response with tool calls,