πŸ€– AI Tools
Β· 5 min read
Last updated on

OpenAI Agents SDK: Complete Setup Guide (2026)


OpenAI’s Agents SDK is the production-ready framework for building multi-agent systems. It replaced the experimental Swarm framework and now includes sandbox execution, model-native harness, persistent sessions, and integrations with Cloudflare, Modal, E2B, Vercel, and Temporal.

The April 2026 update added the biggest feature: native sandbox execution. Agents can now interact with files and run commands in isolated environments, with credentials kept outside the sandbox. Here’s how to set it up from scratch.

Install

mkdir my-agent-project && cd my-agent-project
python -m venv .venv
source .venv/bin/activate

pip install openai-agents
# or: uv add openai-agents

export OPENAI_API_KEY=sk-...

The SDK requires Python 3.10+. It works with OpenAI models by default but supports 100+ LLMs through LiteLLM and AnyLLM adapters.

Your first agent

import asyncio
from agents import Agent, Runner

agent = Agent(
    name="Code Reviewer",
    instructions="Review code for bugs, security issues, and style problems. Be specific and actionable.",
)

async def main():
    result = await Runner.run(agent, "Review this: def login(user, pwd): return db.query(f'SELECT * FROM users WHERE name={user} AND pass={pwd}')")
    print(result.final_output)

asyncio.run(main())

The Runner handles execution, tool calls, and handoffs. It returns a RunResult with the final output and metadata about which agent answered.

Adding tools

Tools let agents interact with the outside world. Decorate any function with @function_tool:

from agents import Agent, Runner, function_tool
import subprocess

@function_tool
def run_tests(directory: str) -> str:
    """Run pytest in the given directory and return results."""
    result = subprocess.run(
        ["pytest", directory, "--tb=short", "-q"],
        capture_output=True, text=True, timeout=30
    )
    return result.stdout + result.stderr

agent = Agent(
    name="Test Runner",
    instructions="Run tests and explain any failures clearly.",
    tools=[run_tests],
)

The SDK automatically generates the JSON schema from your function signature and docstring. No manual schema definition needed.

Multi-agent handoffs

The core pattern: a triage agent routes to specialists. Each specialist handles its domain and can hand back to the triage agent or to another specialist.

from agents import Agent

backend_agent = Agent(
    name="Backend Specialist",
    handoff_description="Handles API design, database queries, and server-side logic",
    instructions="You are a backend expert. Focus on API design, database optimization, and server architecture.",
)

frontend_agent = Agent(
    name="Frontend Specialist",
    handoff_description="Handles UI components, CSS, and client-side JavaScript",
    instructions="You are a frontend expert. Focus on React, CSS, and browser APIs.",
)

security_agent = Agent(
    name="Security Reviewer",
    handoff_description="Reviews code for security vulnerabilities and suggests fixes",
    instructions="You are a security expert. Find SQL injection, XSS, CSRF, and auth issues.",
)

triage = Agent(
    name="Triage",
    instructions="Route each question to the right specialist. If unsure, ask the user to clarify.",
    handoffs=[backend_agent, frontend_agent, security_agent],
)

Run it the same way: Runner.run(triage, "Is this login endpoint secure?"). The triage agent decides which specialist handles it.

Sandbox execution (new in April 2026)

This is the big update. Sandbox agents run in isolated environments where they can read/write files and execute commands β€” without access to your host system.

The architecture separates two layers:

  • Harness: orchestration, credentials, and agent logic (runs on your machine)
  • Sandbox: compute, files, commands, and packages (runs in isolation)
from agents.sandbox import SandboxAgent, Manifest

manifest = Manifest(
    instructions="Build a REST API with FastAPI. Write tests. Deploy when tests pass.",
    workspace={
        "requirements.txt": "fastapi\nuvicorn\npytest\nhttpx",
    },
    permissions={"network": True, "filesystem": "workspace"},
)

agent = SandboxAgent(
    name="API Builder",
    manifest=manifest,
    sandbox="docker",  # or "unix_local" for development
)

Sandbox options:

  • Docker: full isolation, production-ready
  • Unix local: faster for development, less isolation
  • Cloudflare: edge deployment with global distribution
  • Modal: serverless GPU compute for ML workloads
  • E2B: cloud sandboxes with snapshot/restore

Sessions and memory

For multi-turn conversations, use sessions to persist state:

from agents import Agent, Runner
from agents.extensions.memory import SQLAlchemySession

session = SQLAlchemySession(url="sqlite:///agent_memory.db")

agent = Agent(
    name="Project Assistant",
    instructions="Help with the user's project. Remember context from previous conversations.",
)

# First conversation
result = await Runner.run(agent, "I'm building a SaaS app with Stripe billing", session=session)

# Later conversation β€” agent remembers the context
result = await Runner.run(agent, "How should I handle failed payments?", session=session)

Session backends: SQLAlchemy (Postgres, MySQL, SQLite), Redis, Dapr, or encrypted sessions for sensitive data.

Guardrails

Guardrails validate inputs and outputs before they reach the user:

from agents import Agent, InputGuardrail, GuardrailFunctionOutput

async def no_pii_guardrail(ctx, agent, input) -> GuardrailFunctionOutput:
    """Block requests containing personal information."""
    pii_patterns = ["social security", "credit card", "ssn", "passport number"]
    input_lower = str(input).lower()
    if any(p in input_lower for p in pii_patterns):
        return GuardrailFunctionOutput(
            output_info={"reason": "PII detected"},
            tripwire_triggered=True,
        )
    return GuardrailFunctionOutput(output_info={"reason": "clean"}, tripwire_triggered=False)

agent = Agent(
    name="Support Agent",
    instructions="Help customers with their accounts.",
    input_guardrails=[InputGuardrail(guardrail_function=no_pii_guardrail)],
)

Tracing

Every agent run is automatically traced. View traces in the OpenAI Dashboard at platform.openai.com/traces to debug handoff decisions, tool calls, and token usage.

For custom tracing, the SDK integrates with OpenTelemetry:

from agents.tracing import trace

@trace
async def my_workflow():
    result = await Runner.run(agent, "Analyze this codebase")
    return result.final_output

This connects directly to your existing observability stack β€” Helicone, Langfuse, or any OpenTelemetry-compatible backend.

When to use the Agents SDK vs alternatives

NeedUse
Multi-agent orchestration with OpenAIAgents SDK
Model-agnostic agentsLangChain or CrewAI
No-code agent automationZapier Agent SDK
Terminal-based coding agentClaude Code or Codex CLI
Local/private agentsOllama + custom harness
Agent observabilityHelicone or Langfuse

The Agents SDK is the right choice when you’re building on OpenAI models and need production features: sandboxing, sessions, guardrails, and tracing. If you need model flexibility, LangChain or CrewAI give you more options at the cost of more complexity.

Production checklist

Before deploying:

  • Use Docker sandbox (not unix_local) for isolation
  • Set up guardrails for PII, prompt injection, and output validation
  • Configure session persistence (not in-memory)
  • Enable tracing and connect to your observability platform
  • Set token budgets per agent to control costs
  • Test handoff logic with edge cases
  • Add human-in-the-loop approval for high-stakes actions

FAQ

Is the OpenAI Agents SDK free?

The SDK itself is free and open-source. However, you pay for the underlying OpenAI API calls your agents make β€” token usage is billed at standard OpenAI API rates based on the model you choose.

How does it compare to LangChain?

The Agents SDK is more opinionated and tightly integrated with OpenAI models, offering built-in sandboxing, tracing, and guardrails with less configuration. LangChain is model-agnostic and more flexible but requires more setup and has a steeper learning curve for production deployments.

Can I use it with non-OpenAI models?

Yes, through LiteLLM and AnyLLM adapters the SDK supports 100+ LLMs including Claude, Gemini, and local models via Ollama. However, some features like native tracing in the OpenAI Dashboard only work with OpenAI models.

Is it production-ready?

Yes. The Agents SDK replaced the experimental Swarm framework and is designed for production use with features like Docker sandboxing, persistent sessions, guardrails, and OpenTelemetry tracing. Many companies are running it in production today.

Related: How to Build an AI Agent Β· Best AI Agent Frameworks Β· Agent Orchestration Patterns Β· How to Debug AI Agents Β· AI Agent Security Β· LLM Observability Β· Claude Code vs Codex CLI vs Gemini CLI