Apr 21, 2026 · 2 min read

How to Monitor and Control AI API Spending — Stop the Surprise Bills

AI API costs can spiral from $50 to $5,000 in a weekend if an agent loops or traffic spikes. Here’s how to prevent surprise bills.

1. Set hard spending limits

Every provider offers spending caps. Set them BEFORE you start:

Provider	How to set limit
OpenAI	Settings → Billing → Usage limits → Set hard cap
Anthropic	Console → Plans → Set monthly spend limit
OpenRouter	Settings → Credit limit (auto-stops at $0)
Google AI	Cloud Console → Budgets & alerts
DeepSeek	Top up fixed amount, no auto-recharge

Rule: Set your hard cap at 2x your expected monthly spend. If you expect $50/month, cap at $100.

2. Set up alerts

Don’t wait for the bill. Get notified at 50%, 75%, and 90% of your budget:

# Simple spending tracker
import json
from datetime import datetime

DAILY_BUDGET = 10.00  # dollars

def track_spend(tokens_used, model, cost_per_1m):
    cost = (tokens_used / 1_000_000) * cost_per_1m
    
    today = datetime.now().strftime("%Y-%m-%d")
    ledger = json.load(open("spend_ledger.json", "r"))
    ledger.setdefault(today, 0)
    ledger[today] += cost
    json.dump(ledger, open("spend_ledger.json", "w"))
    
    if ledger[today] > DAILY_BUDGET * 0.75:
        send_alert(f"⚠️ AI spend at ${ledger[today]:.2f} today (75% of budget)")
    if ledger[today] > DAILY_BUDGET:
        send_alert(f"🚨 AI spend OVER BUDGET: ${ledger[today]:.2f}")
        raise Exception("Daily budget exceeded")

For our AI race, we added OpenRouter budget detection to the orchestrator — it sends a Discord alert when agents run out of credits.

3. Use prepaid credits (not auto-billing)

The safest approach: buy a fixed amount of credits and don’t enable auto-recharge.

OpenRouter: Buy $25 credits. When they’re gone, API stops. No surprise bills.
DeepSeek: Same — top up a fixed amount.
OpenAI/Anthropic: These auto-charge by default. Disable auto-recharge and set hard caps.

4. Log everything

Track per-request costs so you know where money goes:

import logging

logger = logging.getLogger("ai_costs")

def log_request(model, input_tokens, output_tokens, cost):
    logger.info(f"model={model} in={input_tokens} out={output_tokens} cost=${cost:.4f}")

After a week, you’ll know exactly which features/endpoints consume the most tokens.

5. Circuit breakers for agents

AI agents can loop — retrying the same failed task hundreds of times. Add circuit breakers:

MAX_RETRIES = 3
MAX_TOKENS_PER_SESSION = 500_000

if retry_count > MAX_RETRIES:
    log("Agent stuck in loop, stopping")
    break

if session_tokens > MAX_TOKENS_PER_SESSION:
    log("Token budget exceeded for this session")
    break

Our race orchestrator has a 3-consecutive-failure guard that stops agents from burning budget on repeated failures.

The monitoring stack

Need	Free option	Paid option
Spending alerts	Custom script + Discord/Slack	Helicone, LangSmith
Usage dashboard	Provider dashboards	Helicone ($20/mo)
Per-request logging	Custom middleware	LangSmith, Portkey
Budget enforcement	Hard caps + prepaid	Portkey, custom proxy

For most teams, provider dashboards + hard caps + a simple logging script is enough. You don’t need a $200/month observability platform to track $50/month in API costs.

How to Monitor and Control AI API Spending — Stop the Surprise Bills

1. Set hard spending limits

2. Set up alerts

3. Use prepaid credits (not auto-billing)

4. Log everything

5. Circuit breakers for agents

The monitoring stack

📬 AI Dev Weekly

You might also like

LLM Observability for Developers — How to Monitor AI Apps in Production

LLM Cost Calculator — How to Estimate Your Monthly AI Spend

Prompt Caching Explained — Save Up to 90% on LLM API Costs

How to Reduce LLM API Costs by 70% — 5 Strategies That Actually Work