This guide documents how to deploy Deep Agents to LangSmith cloud with a focus on workspace/file lifecycle - a critical topic for production deployments.
Deep Agents CAN be deployed to LangSmith cloud. From the official LangChain documentation:
"Deep agents applications can be deployed via LangSmith Deployment and monitored with LangSmith Observability."
Note: LangGraph Platform was renamed to LangSmith Deployment as of October 2025.
This section addresses common questions about the notebook's approach to workspace management.
workspace_path = Path("workspace").absolute()
filesystem_backend = FilesystemBackend(
root_dir=str(workspace_path),
virtual_mode=True # Sandboxing only, NOT persistence
)NO - for local development, this is the standard pattern:
FilesystemBackendwithvirtual_mode=Trueis the documented local dev approach- Files persist to disk for easy inspection and debugging
- No automatic cleanup is intentional - lets developers see accumulated results
YES - this would be atypical (problematic) in production:
- Files would be lost on container restart
- Files don't migrate between containers during scaling
- Production should use
StateBackendorCompositeBackend
| Context | Typical Pattern | Notebook Approach |
|---|---|---|
| Local Development | FilesystemBackend + real directory | ✅ Matches |
| Production Cloud | StateBackend or CompositeBackend | ❌ Would need change |
Thread ends when .invoke() returns - but workspace files are NOT deleted because they're written to real disk, not agent state.
| Event | In-Memory State | Workspace Files |
|---|---|---|
.invoke() returns |
Thread complete | PERSIST on disk |
| Run another cell | New thread starts | Previous files remain |
| Kernel restart | All memory lost | PERSIST on disk |
Delete workspace/ |
N/A | Finally cleaned |
Why files persist: FilesystemBackend writes to real disk (workspace/), not agent state. The "transient per-thread" rule applies to StateBackend, not real filesystem writes.
Cell 20: agent.invoke(...) → Thread 1 ends → Files written to workspace/
Cell 30: agent.invoke(...) → Thread 2 ends → More files added
Cell 50: agent.invoke(...) → Thread 3 ends → Files accumulate
Kernel restart: → Memory cleared → Files still on disk
rm -rf workspace/: → Finally cleaned
NOTEBOOK (Local Dev):
Agent → FilesystemBackend → Real disk (workspace/)
Files persist indefinitely until manually deleted
CLOUD (Production):
Agent → StateBackend → Agent state (ephemeral)
OR
Agent → CompositeBackend:
/memories/* → StoreBackend (PostgreSQL - persistent)
/* → StateBackend (ephemeral per-thread)
The notebook implements a single-user deep research pattern - this is intentional for local development but requires significant changes for production.
| Characteristic | Notebook (Single-User) | Production (Multi-User) |
|---|---|---|
| User isolation | None - shared workspace/ |
(user_id, "...") namespace per user |
| Store backend | InMemoryStore |
PostgresStore with user scoping |
| File backend | FilesystemBackend → local disk |
CompositeBackend → StateBackend + StoreBackend |
| Authentication | None | Required (JWT, OAuth, etc.) |
| Session management | Jupyter kernel lifecycle | Thread IDs + checkpointer |
| Cleanup policy | Manual rm -rf workspace/ |
TTL policies per user namespace |
- Learning focus: Students see all agent outputs in one place
- Debugging convenience: Inspect
workspace/files directly - Iterative development: Accumulated context helps experimentation
- No infrastructure: No database, no auth, no deployment
# NOTEBOOK (Single-User)
store = InMemoryStore()
backend = FilesystemBackend(root_dir="workspace", virtual_mode=True)
# PRODUCTION (Multi-User)
store = PostgresStore.from_conn_string(DB_URI)
backend = CompositeBackend(
routes={
"/memories/": StoreBackend(store=store), # User-scoped persistence
"/": StateBackend(), # Ephemeral per-thread
}
)
# User isolation via namespaces
def get_user_namespace(user_id: str, category: str):
return (user_id, category) # e.g., ("user_123", "profile")NOTEBOOK (Single-User Deep Research):
┌─────────────────────────────────────┐
│ Jupyter Notebook │
│ └─ Deep Agent │
│ ├─ InMemoryStore (shared) │
│ └─ FilesystemBackend │
│ └─ workspace/ (shared) │
└─────────────────────────────────────┘
PRODUCTION (Multi-User):
┌─────────────────────────────────────┐
│ API Server + Auth │
│ └─ Deep Agent (per request) │
│ ├─ PostgresStore │
│ │ └─ (user_id, category) │
│ └─ CompositeBackend │
│ ├─ /memories/* → Store │
│ └─ /* → State (ephemeral) │
└─────────────────────────────────────┘
│
▼
PostgreSQL (durable, user-isolated)
From official LangChain documentation:
"Deep agents come with a local filesystem to offload memory. By default, this filesystem is stored in agent state and is transient to a single thread—files are lost when the conversation ends."
Source: Long-term memory - Deep Agents
| Storage Location | Scope | Lifetime | Cleanup |
|---|---|---|---|
| Agent Filesystem | Per-thread | Thread ends → LOST | Automatic on thread end |
| Container Local FS | Per-container | Container restart → LOST | Container termination |
| StateBackend | Per-thread | Thread ends → lost | Automatic |
| StoreBackend (PostgreSQL) | Cross-thread | Configurable TTL | TTL policy or manual |
| Redis | Ephemeral only | Auto-expires | Built-in TTL |
- Thread ends → Files in agent state are garbage collected
- Container restarts → All local filesystem data lost
- Scaling events → Files don't migrate between containers
- TTL expiration → Configured via
langgraph.json
Agent[Deep Agent] --> Router{Path Router}
Router --> |/memories/*| Store[StoreBackend] (Persistent across threads)
Router --> |other| State[StateBackend] (Ephemeral - single thread)
from deepagents.backends import CompositeBackend, StateBackend, StoreBackend
backend = CompositeBackend(
routes={
"/memories/": StoreBackend(store=store), # Persistent
"/": StateBackend(), # Ephemeral working files
}
)| Mode | Control Plane | Data Plane | Best For | Pricing |
|---|---|---|---|---|
| Cloud (SaaS) | LangChain | LangChain | Startups, rapid prototyping | $0.005/run |
| Hybrid (BYOC) | LangChain | Your cloud | Data residency requirements | Enterprise ($100k+/yr) |
| Self-Hosted | Your cloud | Your cloud | Air-gapped, maximum control | Enterprise ($100k+/yr) |
From LangGraph Data Plane docs:
- Agent Server: Containerized instances (Docker/Kubernetes)
- Stateless Design: No resources maintained in memory between requests
- Auto-scaling: Production scales up to 10 containers automatically
- Load Balancing: AWS ALB, GCP Load Balancer, or Nginx
| Component | Purpose | Persistence |
|---|---|---|
| PostgreSQL | Checkpoints, threads, memory store, runs | Durable |
| Redis | Communication, streaming, heartbeats | Ephemeral |
| Container FS | Working files during execution | Ephemeral |
Critical: "No user or run data is stored in Redis" - all durable data goes to PostgreSQL.
- Container receives SIGINT → stops accepting new requests
- Grace period (default 180s) for in-progress runs
- Incomplete runs requeued for other instances
- Sweeper task runs every 2 minutes to detect stalled runs
- PostgreSQL MVCC ensures exactly-once semantics
| Backend | Cloud-Safe | Notes |
|---|---|---|
StateBackend |
Yes | Ephemeral, per-thread - good for stateless APIs |
StoreBackend |
Yes | Persistent via LangGraph Store - recommended for production |
FilesystemBackend |
Conditional | Sandboxes writes but still ephemeral |
LocalShellBackend |
Never | Arbitrary shell execution - severe security risk |
CompositeBackend |
Yes | Recommended - hybrid routing to appropriate backends |
virtual_mode=True provides sandbox isolation (prevents writing outside root_dir) but does NOT provide persistence:
- Files are still ephemeral (lost on thread end)
- Use for security sandboxing, not durability
from deepagents.backends import FilesystemBackend
backend = FilesystemBackend(
root_dir=str(workspace_path),
virtual_mode=True # REQUIRED for security, but still ephemeral!
)Configure cleanup policies in langgraph.json:
{
"checkpointer": {
"ttl": {
"strategy": "delete",
"sweep_interval_minutes": 60
}
},
"store": {
"ttl": {
"default_ttl_minutes": 10080
}
}
}Source: How to add TTLs to your application
For production deployments, use PostgreSQL for both checkpoints and store:
from langgraph.checkpoint.postgres import PostgresSaver
from langgraph.store.postgres import PostgresStore
DB_URI = "postgresql://user:pass@host:5432/db?sslmode=require"
with PostgresStore.from_conn_string(DB_URI) as store, \
PostgresSaver.from_conn_string(DB_URI) as checkpointer:
graph = builder.compile(checkpointer=checkpointer, store=store)- Durability: Survives restarts and deployments
- Scalability: Read replicas for high-traffic scenarios
- Transactions: ACID guarantees for state consistency
- Managed options: AWS RDS, GCP Cloud SQL, Supabase, Neon
# User-scoped namespaces (persistent via StoreBackend)
(user_id, "profile") # Demographics, goals
(user_id, "preferences") # Communication style
(user_id, "progress") # Milestones, check-ins
# Global namespaces
("wellness", "knowledge") # Shared knowledge base- User isolation: Each user's data is namespaced
- Easy queries: Filter by namespace tuple prefix
- Compliance: GDPR deletion targets specific user namespaces
Before deploying to production:
- NEVER use
LocalShellBackendin production - Use
CompositeBackendto route persistent data toStoreBackend - Store API keys in environment variables (never in code)
- Never commit
.envfiles (only.env.example) - Enable SSL/TLS for database connections (
sslmode=require) - Implement user isolation via namespace strategy
- Configure human-in-the-loop for sensitive operations:
# Interrupt before sensitive tool executions
graph = builder.compile(
checkpointer=checkpointer,
interrupt_before=["sensitive_tool_node"]
){
"dependencies": ["."],
"graphs": {
"my_agent": "./src/agent.py:graph"
},
"env": ".env",
"checkpointer": {
"ttl": { "strategy": "delete", "sweep_interval_minutes": 60 }
}
}langgraph deploy --config langgraph.jsonfrom langgraph.pregel.remote import RemoteGraph
remote_graph = RemoteGraph(
name="my_agent",
url="https://your-deployment.langgraph.com",
api_key="your-langsmith-api-key"
)
# Use the remote graph just like a local one
result = remote_graph.invoke({"messages": [...]})┌─────────────────────────────────────────────────────┐
│ LangSmith Deployment │
│ ┌───────────────────────────────────────────────┐ │
│ │ Deep Agent Graph │ │
│ │ - Planning (TodoListMiddleware) │ │
│ │ - Context (CompositeBackend) │ │
│ │ └─ /memories/* → StoreBackend (persistent) │ │
│ │ └─ /* → StateBackend (ephemeral) │ │
│ │ - Subagents (SubAgentMiddleware) │ │
│ │ - Memory (MemoryMiddleware) │ │
│ └───────────────────┬───────────────────────────┘ │
│ │ │
│ Container FS │ Redis │
│ (ephemeral) │ (ephemeral comms) │
└──────────────────────┼───────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
PostgreSQL LangSmith Vector DB
(DURABLE: Observability (Optional)
checkpoints,
store items,
threads)
| Component | Scaling Strategy |
|---|---|
StateBackend |
Horizontal scaling works out-of-box (stateless) |
StoreBackend |
PostgreSQL with read replicas for high traffic |
Subagents |
Run in isolated contexts, scale independently |
| Container failures | Transparent - sweeper requeues interrupted runs |
- Coordinator agent: Use capable model (Claude Sonnet, GPT-4o)
- Subagents: Use cheaper models (
openai:gpt-4o-mini~10x cheaper) - Embeddings: Use
text-embedding-3-smallfor semantic search - Context offloading: Use ephemeral files for working data, StoreBackend only for persistent
| Phase | Activities |
|---|---|
| 1. Local Dev | Set up env, test with langgraph dev, use InMemoryStore |
| 2. Database | Provision PostgreSQL, migrate to PostgresStore/PostgresSaver |
| 3. Deploy | Create langgraph.json with TTL config, deploy to LangSmith |
| 4. Harden | Enable tracing, configure HITL, set up monitoring |
Here's how to deploy the Wellness Coach agent from the notebook:
# src/wellness_agent.py
from deepagents import DeepAgent, TodoListMiddleware, MemoryMiddleware
from deepagents.backends import CompositeBackend, StateBackend, StoreBackend
from langgraph.checkpoint.postgres import PostgresSaver
from langgraph.store.postgres import PostgresStore
import os
DB_URI = os.environ["DATABASE_URL"]
def create_wellness_agent():
"""Create a production-ready wellness coach agent."""
with PostgresStore.from_conn_string(DB_URI) as store, \
PostgresSaver.from_conn_string(DB_URI) as checkpointer:
# Use CompositeBackend for hybrid persistence
backend = CompositeBackend(
routes={
"/memories/": StoreBackend(store=store), # Persistent
"/": StateBackend(), # Ephemeral working files
}
)
agent = DeepAgent(
model="anthropic:claude-sonnet-4-20250514",
backend=backend,
middlewares=[
TodoListMiddleware(),
MemoryMiddleware(store=store),
],
checkpointer=checkpointer,
)
return agent.compile()
# Export for langgraph.json
graph = create_wellness_agent(){
"dependencies": ["."],
"graphs": {
"wellness_coach": "./src/wellness_agent.py:graph"
},
"env": ".env",
"checkpointer": {
"ttl": { "strategy": "delete", "sweep_interval_minutes": 60 }
},
"store": {
"ttl": { "default_ttl_minutes": 10080 }
}
}# Test locally first
langgraph dev
# Deploy to LangSmith
langgraph deployLangSmith provides built-in observability:
# Enable tracing (already enabled by default in LangSmith deployments)
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "wellness-coach-prod"- Latency: P50, P95, P99 response times
- Token usage: Input/output tokens per request
- Error rates: Failed invocations, timeouts
- Memory usage: Store operations per session
A: Files are cleaned up at multiple points:
- Thread ends - Agent filesystem in state is garbage collected immediately
- Container restarts - All local filesystem data lost (scaling, deploys, crashes)
- TTL expiration - Configurable via
langgraph.jsonfor StoreBackend items
A:
- Default scope: Per-thread (single conversation)
- Container scope: Ephemeral, lost on container lifecycle events
- For persistence: Route paths to
StoreBackendviaCompositeBackend
| Storage Type | Duration |
|---|---|
| Agent filesystem (default) | Until thread ends |
| Container local FS | Until container restart |
| StoreBackend items | Until TTL expires or manual deletion |
| PostgreSQL checkpoints | Configurable TTL |
- Deep Agents Long-term Memory - File transience, CompositeBackend pattern
- Deep Agents Backends - Backend types, routing configuration
- LangGraph Data Plane - PostgreSQL/Redis architecture
- TTL Configuration - Cleanup policies
- LangSmith Deployment Docs
- Scalability & Resilience - Container lifecycle, sweeper task
- Data Purging for Compliance
- deepagents GitHub
# 1. Use PostgreSQL
from langgraph.checkpoint.postgres import PostgresSaver
from langgraph.store.postgres import PostgresStore
# 2. Use CompositeBackend (not FilesystemBackend alone)
from deepagents.backends import CompositeBackend, StateBackend, StoreBackend
# 3. Enable SSL
DB_URI = "postgresql://...?sslmode=require"
# 4. Configure hybrid persistence
backend = CompositeBackend(
routes={
"/memories/": StoreBackend(store=store),
"/": StateBackend(),
}
)
# 5. Deploy
# langgraph deploy --config langgraph.json# .env.example (commit this, not .env!)
DATABASE_URL=postgresql://user:pass@host:5432/db?sslmode=require
ANTHROPIC_API_KEY=your-key-here
OPENAI_API_KEY=your-key-here
LANGCHAIN_API_KEY=your-key-here
LANGCHAIN_TRACING_V2=true
LANGCHAIN_PROJECT=your-project-name- Check official docs: https://docs.langchain.com/oss/python/deepagents/long-term-memory
- Test locally with
langgraph dev- observe files lost between threads - Deploy test agent - verify files don't persist across container restarts
- Configure TTL in
langgraph.json- verify cleanup via LangSmith UI