Apr 16, 2026 · 3 min read

AI Agent Logging and Tracing: What to Capture in Production (2026)

When an AI agent gives a wrong answer, you need to trace back through its reasoning: what did it read, what tools did it call, what context did it have, and where did it go wrong? Without structured logging, debugging agents is guesswork.

Traditional application logging (logger.info("request processed")) isn’t enough. Agent interactions are multi-step, non-deterministic, and involve external API calls that cost money. You need traces, not just logs.

This guide complements our LLM observability overview with agent-specific tracing patterns.

What to capture

Every agent interaction should log:

trace = {
    # Identity
    "trace_id": "abc-123",
    "session_id": "user_456_session_789",
    "user_id": "user_456",
    "agent_name": "Code Reviewer",
    
    # Input
    "user_message": "Review the auth middleware",
    "system_prompt_tokens": 450,
    
    # Reasoning steps
    "steps": [
        {
            "type": "tool_call",
            "tool": "read_file",
            "input": {"path": "src/auth/middleware.ts"},
            "output_tokens": 1200,
            "duration_ms": 45,
        },
        {
            "type": "tool_call",
            "tool": "search_code",
            "input": {"query": "jwt.verify", "path": "src/"},
            "output_tokens": 800,
            "duration_ms": 120,
        },
        {
            "type": "llm_call",
            "model": "claude-sonnet-4",
            "input_tokens": 3200,
            "output_tokens": 650,
            "duration_ms": 2400,
        },
    ],
    
    # Output
    "final_output": "Found 2 security issues...",
    "total_tokens": 6300,
    "total_cost_usd": 0.032,
    "total_duration_ms": 3100,
}

OpenTelemetry integration

The OpenAI Agents SDK has built-in tracing that exports to OpenTelemetry:

from agents.tracing import trace, set_trace_processors
from opentelemetry.sdk.trace.export import BatchSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# Export traces to your observability platform
exporter = OTLPSpanExporter(endpoint="https://your-collector:4317")
set_trace_processors([BatchSpanExporter(exporter)])

@trace
async def review_code(file_path: str):
    result = await Runner.run(review_agent, f"Review {file_path}")
    return result.final_output

This sends structured traces to any OpenTelemetry-compatible backend: Jaeger, Grafana Tempo, Datadog, or Langfuse.

Platform-specific tracing

Helicone (proxy-based)

Helicone sits between your agent and the LLM API, capturing everything automatically:

import openai

client = openai.OpenAI(
    base_url="https://oai.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": f"Bearer {HELICONE_API_KEY}",
        "Helicone-Session-Id": session_id,
        "Helicone-User-Id": user_id,
    },
)

Zero code changes to your agent. Helicone captures every LLM call with tokens, cost, latency, and the full request/response.

Langfuse (SDK-based)

Langfuse gives you more control with explicit trace creation:

from langfuse import Langfuse

langfuse = Langfuse()

trace = langfuse.trace(name="code-review", user_id=user_id)
span = trace.span(name="read-file", input={"path": file_path})
# ... do the work ...
span.end(output={"content": file_content, "tokens": 1200})

Custom dashboard

For simple setups, log to PostgreSQL and build a dashboard:

CREATE TABLE agent_traces (
    id BIGSERIAL PRIMARY KEY,
    trace_id UUID NOT NULL,
    session_id TEXT,
    user_id TEXT,
    agent_name TEXT,
    step_type TEXT,  -- 'tool_call', 'llm_call', 'error'
    step_name TEXT,
    input JSONB,
    output TEXT,
    tokens_used INTEGER,
    cost_usd NUMERIC(10, 6),
    duration_ms INTEGER,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Quick queries
-- Cost per user today
SELECT user_id, SUM(cost_usd) FROM agent_traces 
WHERE created_at > NOW() - INTERVAL '1 day' GROUP BY user_id;

-- Slowest tool calls
SELECT step_name, AVG(duration_ms), COUNT(*) FROM agent_traces 
WHERE step_type = 'tool_call' GROUP BY step_name ORDER BY AVG(duration_ms) DESC;

Alerting on trace data

Set up alerts for anomalies:

Alert	Threshold	Action
High error rate	>5% of traces have errors	Page on-call
Cost spike	Daily cost >2x average	Notify team
Slow responses	p95 latency >30s	Investigate
Loop detection	>3 identical tool calls in trace	Auto-interrupt agent
Token budget	User at 80% of daily limit	Warn user

What NOT to log

Full user messages in plain text if they contain PII — hash or redact
API keys or tokens — never log credentials
Full file contents from tool calls — log file paths and sizes instead
Every intermediate token — log summaries, not streams

Balance observability with privacy. See our GDPR guide for compliance requirements.

AI Agent Logging and Tracing: What to Capture in Production (2026)

What to capture

OpenTelemetry integration

Platform-specific tracing

Helicone (proxy-based)

Langfuse (SDK-based)

Custom dashboard

Alerting on trace data

What NOT to log

📬 AI Dev Weekly

You might also like

How to Debug AI Agents — When Your Agent Goes Off the Rails

Agent Memory Patterns — How to Give AI Agents Long-Term Context

LLM Alerting in Production — What to Alert On and What to Ignore

What to Log in AI Systems — And What Not To