OpenAIβs Agents SDK is the production-ready framework for building multi-agent systems. It replaced the experimental Swarm framework and now includes sandbox execution, model-native harness, persistent sessions, and integrations with Cloudflare, Modal, E2B, Vercel, and Temporal.
The April 2026 update added the biggest feature: native sandbox execution. Agents can now interact with files and run commands in isolated environments, with credentials kept outside the sandbox. Hereβs how to set it up from scratch.
Install
mkdir my-agent-project && cd my-agent-project
python -m venv .venv
source .venv/bin/activate
pip install openai-agents
# or: uv add openai-agents
export OPENAI_API_KEY=sk-...
The SDK requires Python 3.10+. It works with OpenAI models by default but supports 100+ LLMs through LiteLLM and AnyLLM adapters.
Your first agent
import asyncio
from agents import Agent, Runner
agent = Agent(
name="Code Reviewer",
instructions="Review code for bugs, security issues, and style problems. Be specific and actionable.",
)
async def main():
result = await Runner.run(agent, "Review this: def login(user, pwd): return db.query(f'SELECT * FROM users WHERE name={user} AND pass={pwd}')")
print(result.final_output)
asyncio.run(main())
The Runner handles execution, tool calls, and handoffs. It returns a RunResult with the final output and metadata about which agent answered.
Adding tools
Tools let agents interact with the outside world. Decorate any function with @function_tool:
from agents import Agent, Runner, function_tool
import subprocess
@function_tool
def run_tests(directory: str) -> str:
"""Run pytest in the given directory and return results."""
result = subprocess.run(
["pytest", directory, "--tb=short", "-q"],
capture_output=True, text=True, timeout=30
)
return result.stdout + result.stderr
agent = Agent(
name="Test Runner",
instructions="Run tests and explain any failures clearly.",
tools=[run_tests],
)
The SDK automatically generates the JSON schema from your function signature and docstring. No manual schema definition needed.
Multi-agent handoffs
The core pattern: a triage agent routes to specialists. Each specialist handles its domain and can hand back to the triage agent or to another specialist.
from agents import Agent
backend_agent = Agent(
name="Backend Specialist",
handoff_description="Handles API design, database queries, and server-side logic",
instructions="You are a backend expert. Focus on API design, database optimization, and server architecture.",
)
frontend_agent = Agent(
name="Frontend Specialist",
handoff_description="Handles UI components, CSS, and client-side JavaScript",
instructions="You are a frontend expert. Focus on React, CSS, and browser APIs.",
)
security_agent = Agent(
name="Security Reviewer",
handoff_description="Reviews code for security vulnerabilities and suggests fixes",
instructions="You are a security expert. Find SQL injection, XSS, CSRF, and auth issues.",
)
triage = Agent(
name="Triage",
instructions="Route each question to the right specialist. If unsure, ask the user to clarify.",
handoffs=[backend_agent, frontend_agent, security_agent],
)
Run it the same way: Runner.run(triage, "Is this login endpoint secure?"). The triage agent decides which specialist handles it.
Sandbox execution (new in April 2026)
This is the big update. Sandbox agents run in isolated environments where they can read/write files and execute commands β without access to your host system.
The architecture separates two layers:
- Harness: orchestration, credentials, and agent logic (runs on your machine)
- Sandbox: compute, files, commands, and packages (runs in isolation)
from agents.sandbox import SandboxAgent, Manifest
manifest = Manifest(
instructions="Build a REST API with FastAPI. Write tests. Deploy when tests pass.",
workspace={
"requirements.txt": "fastapi\nuvicorn\npytest\nhttpx",
},
permissions={"network": True, "filesystem": "workspace"},
)
agent = SandboxAgent(
name="API Builder",
manifest=manifest,
sandbox="docker", # or "unix_local" for development
)
Sandbox options:
- Docker: full isolation, production-ready
- Unix local: faster for development, less isolation
- Cloudflare: edge deployment with global distribution
- Modal: serverless GPU compute for ML workloads
- E2B: cloud sandboxes with snapshot/restore
Sessions and memory
For multi-turn conversations, use sessions to persist state:
from agents import Agent, Runner
from agents.extensions.memory import SQLAlchemySession
session = SQLAlchemySession(url="sqlite:///agent_memory.db")
agent = Agent(
name="Project Assistant",
instructions="Help with the user's project. Remember context from previous conversations.",
)
# First conversation
result = await Runner.run(agent, "I'm building a SaaS app with Stripe billing", session=session)
# Later conversation β agent remembers the context
result = await Runner.run(agent, "How should I handle failed payments?", session=session)
Session backends: SQLAlchemy (Postgres, MySQL, SQLite), Redis, Dapr, or encrypted sessions for sensitive data.
Guardrails
Guardrails validate inputs and outputs before they reach the user:
from agents import Agent, InputGuardrail, GuardrailFunctionOutput
async def no_pii_guardrail(ctx, agent, input) -> GuardrailFunctionOutput:
"""Block requests containing personal information."""
pii_patterns = ["social security", "credit card", "ssn", "passport number"]
input_lower = str(input).lower()
if any(p in input_lower for p in pii_patterns):
return GuardrailFunctionOutput(
output_info={"reason": "PII detected"},
tripwire_triggered=True,
)
return GuardrailFunctionOutput(output_info={"reason": "clean"}, tripwire_triggered=False)
agent = Agent(
name="Support Agent",
instructions="Help customers with their accounts.",
input_guardrails=[InputGuardrail(guardrail_function=no_pii_guardrail)],
)
Tracing
Every agent run is automatically traced. View traces in the OpenAI Dashboard at platform.openai.com/traces to debug handoff decisions, tool calls, and token usage.
For custom tracing, the SDK integrates with OpenTelemetry:
from agents.tracing import trace
@trace
async def my_workflow():
result = await Runner.run(agent, "Analyze this codebase")
return result.final_output
This connects directly to your existing observability stack β Helicone, Langfuse, or any OpenTelemetry-compatible backend.
When to use the Agents SDK vs alternatives
| Need | Use |
|---|---|
| Multi-agent orchestration with OpenAI | Agents SDK |
| Model-agnostic agents | LangChain or CrewAI |
| No-code agent automation | Zapier Agent SDK |
| Terminal-based coding agent | Claude Code or Codex CLI |
| Local/private agents | Ollama + custom harness |
| Agent observability | Helicone or Langfuse |
The Agents SDK is the right choice when youβre building on OpenAI models and need production features: sandboxing, sessions, guardrails, and tracing. If you need model flexibility, LangChain or CrewAI give you more options at the cost of more complexity.
Production checklist
Before deploying:
- Use Docker sandbox (not unix_local) for isolation
- Set up guardrails for PII, prompt injection, and output validation
- Configure session persistence (not in-memory)
- Enable tracing and connect to your observability platform
- Set token budgets per agent to control costs
- Test handoff logic with edge cases
- Add human-in-the-loop approval for high-stakes actions
Related: How to Build an AI Agent Β· Best AI Agent Frameworks Β· Agent Orchestration Patterns Β· How to Debug AI Agents Β· AI Agent Security Β· LLM Observability Β· Claude Code vs Codex CLI vs Gemini CLI