πŸ€– AI Tools
Β· 4 min read

OpenAI Agents SDK: Complete Setup Guide (2026)


OpenAI’s Agents SDK is the production-ready framework for building multi-agent systems. It replaced the experimental Swarm framework and now includes sandbox execution, model-native harness, persistent sessions, and integrations with Cloudflare, Modal, E2B, Vercel, and Temporal.

The April 2026 update added the biggest feature: native sandbox execution. Agents can now interact with files and run commands in isolated environments, with credentials kept outside the sandbox. Here’s how to set it up from scratch.

Install

mkdir my-agent-project && cd my-agent-project
python -m venv .venv
source .venv/bin/activate

pip install openai-agents
# or: uv add openai-agents

export OPENAI_API_KEY=sk-...

The SDK requires Python 3.10+. It works with OpenAI models by default but supports 100+ LLMs through LiteLLM and AnyLLM adapters.

Your first agent

import asyncio
from agents import Agent, Runner

agent = Agent(
    name="Code Reviewer",
    instructions="Review code for bugs, security issues, and style problems. Be specific and actionable.",
)

async def main():
    result = await Runner.run(agent, "Review this: def login(user, pwd): return db.query(f'SELECT * FROM users WHERE name={user} AND pass={pwd}')")
    print(result.final_output)

asyncio.run(main())

The Runner handles execution, tool calls, and handoffs. It returns a RunResult with the final output and metadata about which agent answered.

Adding tools

Tools let agents interact with the outside world. Decorate any function with @function_tool:

from agents import Agent, Runner, function_tool
import subprocess

@function_tool
def run_tests(directory: str) -> str:
    """Run pytest in the given directory and return results."""
    result = subprocess.run(
        ["pytest", directory, "--tb=short", "-q"],
        capture_output=True, text=True, timeout=30
    )
    return result.stdout + result.stderr

agent = Agent(
    name="Test Runner",
    instructions="Run tests and explain any failures clearly.",
    tools=[run_tests],
)

The SDK automatically generates the JSON schema from your function signature and docstring. No manual schema definition needed.

Multi-agent handoffs

The core pattern: a triage agent routes to specialists. Each specialist handles its domain and can hand back to the triage agent or to another specialist.

from agents import Agent

backend_agent = Agent(
    name="Backend Specialist",
    handoff_description="Handles API design, database queries, and server-side logic",
    instructions="You are a backend expert. Focus on API design, database optimization, and server architecture.",
)

frontend_agent = Agent(
    name="Frontend Specialist",
    handoff_description="Handles UI components, CSS, and client-side JavaScript",
    instructions="You are a frontend expert. Focus on React, CSS, and browser APIs.",
)

security_agent = Agent(
    name="Security Reviewer",
    handoff_description="Reviews code for security vulnerabilities and suggests fixes",
    instructions="You are a security expert. Find SQL injection, XSS, CSRF, and auth issues.",
)

triage = Agent(
    name="Triage",
    instructions="Route each question to the right specialist. If unsure, ask the user to clarify.",
    handoffs=[backend_agent, frontend_agent, security_agent],
)

Run it the same way: Runner.run(triage, "Is this login endpoint secure?"). The triage agent decides which specialist handles it.

Sandbox execution (new in April 2026)

This is the big update. Sandbox agents run in isolated environments where they can read/write files and execute commands β€” without access to your host system.

The architecture separates two layers:

  • Harness: orchestration, credentials, and agent logic (runs on your machine)
  • Sandbox: compute, files, commands, and packages (runs in isolation)
from agents.sandbox import SandboxAgent, Manifest

manifest = Manifest(
    instructions="Build a REST API with FastAPI. Write tests. Deploy when tests pass.",
    workspace={
        "requirements.txt": "fastapi\nuvicorn\npytest\nhttpx",
    },
    permissions={"network": True, "filesystem": "workspace"},
)

agent = SandboxAgent(
    name="API Builder",
    manifest=manifest,
    sandbox="docker",  # or "unix_local" for development
)

Sandbox options:

  • Docker: full isolation, production-ready
  • Unix local: faster for development, less isolation
  • Cloudflare: edge deployment with global distribution
  • Modal: serverless GPU compute for ML workloads
  • E2B: cloud sandboxes with snapshot/restore

Sessions and memory

For multi-turn conversations, use sessions to persist state:

from agents import Agent, Runner
from agents.extensions.memory import SQLAlchemySession

session = SQLAlchemySession(url="sqlite:///agent_memory.db")

agent = Agent(
    name="Project Assistant",
    instructions="Help with the user's project. Remember context from previous conversations.",
)

# First conversation
result = await Runner.run(agent, "I'm building a SaaS app with Stripe billing", session=session)

# Later conversation β€” agent remembers the context
result = await Runner.run(agent, "How should I handle failed payments?", session=session)

Session backends: SQLAlchemy (Postgres, MySQL, SQLite), Redis, Dapr, or encrypted sessions for sensitive data.

Guardrails

Guardrails validate inputs and outputs before they reach the user:

from agents import Agent, InputGuardrail, GuardrailFunctionOutput

async def no_pii_guardrail(ctx, agent, input) -> GuardrailFunctionOutput:
    """Block requests containing personal information."""
    pii_patterns = ["social security", "credit card", "ssn", "passport number"]
    input_lower = str(input).lower()
    if any(p in input_lower for p in pii_patterns):
        return GuardrailFunctionOutput(
            output_info={"reason": "PII detected"},
            tripwire_triggered=True,
        )
    return GuardrailFunctionOutput(output_info={"reason": "clean"}, tripwire_triggered=False)

agent = Agent(
    name="Support Agent",
    instructions="Help customers with their accounts.",
    input_guardrails=[InputGuardrail(guardrail_function=no_pii_guardrail)],
)

Tracing

Every agent run is automatically traced. View traces in the OpenAI Dashboard at platform.openai.com/traces to debug handoff decisions, tool calls, and token usage.

For custom tracing, the SDK integrates with OpenTelemetry:

from agents.tracing import trace

@trace
async def my_workflow():
    result = await Runner.run(agent, "Analyze this codebase")
    return result.final_output

This connects directly to your existing observability stack β€” Helicone, Langfuse, or any OpenTelemetry-compatible backend.

When to use the Agents SDK vs alternatives

NeedUse
Multi-agent orchestration with OpenAIAgents SDK
Model-agnostic agentsLangChain or CrewAI
No-code agent automationZapier Agent SDK
Terminal-based coding agentClaude Code or Codex CLI
Local/private agentsOllama + custom harness
Agent observabilityHelicone or Langfuse

The Agents SDK is the right choice when you’re building on OpenAI models and need production features: sandboxing, sessions, guardrails, and tracing. If you need model flexibility, LangChain or CrewAI give you more options at the cost of more complexity.

Production checklist

Before deploying:

  • Use Docker sandbox (not unix_local) for isolation
  • Set up guardrails for PII, prompt injection, and output validation
  • Configure session persistence (not in-memory)
  • Enable tracing and connect to your observability platform
  • Set token budgets per agent to control costs
  • Test handoff logic with edge cases
  • Add human-in-the-loop approval for high-stakes actions

Related: How to Build an AI Agent Β· Best AI Agent Frameworks Β· Agent Orchestration Patterns Β· How to Debug AI Agents Β· AI Agent Security Β· LLM Observability Β· Claude Code vs Codex CLI vs Gemini CLI