AI API costs can spiral from $50 to $5,000 in a weekend if an agent loops or traffic spikes. Hereβs how to prevent surprise bills.
1. Set hard spending limits
Every provider offers spending caps. Set them BEFORE you start:
| Provider | How to set limit |
|---|---|
| OpenAI | Settings β Billing β Usage limits β Set hard cap |
| Anthropic | Console β Plans β Set monthly spend limit |
| OpenRouter | Settings β Credit limit (auto-stops at $0) |
| Google AI | Cloud Console β Budgets & alerts |
| DeepSeek | Top up fixed amount, no auto-recharge |
Rule: Set your hard cap at 2x your expected monthly spend. If you expect $50/month, cap at $100.
2. Set up alerts
Donβt wait for the bill. Get notified at 50%, 75%, and 90% of your budget:
# Simple spending tracker
import json
from datetime import datetime
DAILY_BUDGET = 10.00 # dollars
def track_spend(tokens_used, model, cost_per_1m):
cost = (tokens_used / 1_000_000) * cost_per_1m
today = datetime.now().strftime("%Y-%m-%d")
ledger = json.load(open("spend_ledger.json", "r"))
ledger.setdefault(today, 0)
ledger[today] += cost
json.dump(ledger, open("spend_ledger.json", "w"))
if ledger[today] > DAILY_BUDGET * 0.75:
send_alert(f"β οΈ AI spend at ${ledger[today]:.2f} today (75% of budget)")
if ledger[today] > DAILY_BUDGET:
send_alert(f"π¨ AI spend OVER BUDGET: ${ledger[today]:.2f}")
raise Exception("Daily budget exceeded")
For our AI race, we added OpenRouter budget detection to the orchestrator β it sends a Discord alert when agents run out of credits.
3. Use prepaid credits (not auto-billing)
The safest approach: buy a fixed amount of credits and donβt enable auto-recharge.
- OpenRouter: Buy $25 credits. When theyβre gone, API stops. No surprise bills.
- DeepSeek: Same β top up a fixed amount.
- OpenAI/Anthropic: These auto-charge by default. Disable auto-recharge and set hard caps.
4. Log everything
Track per-request costs so you know where money goes:
import logging
logger = logging.getLogger("ai_costs")
def log_request(model, input_tokens, output_tokens, cost):
logger.info(f"model={model} in={input_tokens} out={output_tokens} cost=${cost:.4f}")
After a week, youβll know exactly which features/endpoints consume the most tokens.
5. Circuit breakers for agents
AI agents can loop β retrying the same failed task hundreds of times. Add circuit breakers:
MAX_RETRIES = 3
MAX_TOKENS_PER_SESSION = 500_000
if retry_count > MAX_RETRIES:
log("Agent stuck in loop, stopping")
break
if session_tokens > MAX_TOKENS_PER_SESSION:
log("Token budget exceeded for this session")
break
Our race orchestrator has a 3-consecutive-failure guard that stops agents from burning budget on repeated failures.
The monitoring stack
| Need | Free option | Paid option |
|---|---|---|
| Spending alerts | Custom script + Discord/Slack | Helicone, LangSmith |
| Usage dashboard | Provider dashboards | Helicone ($20/mo) |
| Per-request logging | Custom middleware | LangSmith, Portkey |
| Budget enforcement | Hard caps + prepaid | Portkey, custom proxy |
For most teams, provider dashboards + hard caps + a simple logging script is enough. You donβt need a $200/month observability platform to track $50/month in API costs.
Related: How to Reduce LLM API Costs Β· LLM Cost Calculator Guide Β· Prompt Caching Explained Β· OpenRouter Complete Guide