Some links in this article are affiliate links. We earn a commission at no extra cost to you when you purchase through them. Full disclosure.
Your AI app works on localhost. Now you need to ship it. AI deployments have more failure modes than traditional software: costs scale with usage, model behavior changes unpredictably, and quality degrades silently.
This checklist catches the issues before your users do.
Pre-deployment
API & Model Configuration
- API keys stored in environment variables, not in code
- API keys scoped to minimum required permissions (guide to securing AI API keys)
- Model version pinned (not βlatestβ) to prevent surprise behavior changes
- Fallback model configured for when primary is down (e.g., DeepSeek as fallback for Claude)
-
max_tokensset on every request to prevent runaway generation - Temperature set explicitly (donβt rely on defaults)
Cost Protection
- Monthly spending limit set with provider (monitor guide)
- Per-request cost estimate logged (what to log)
- Alert at 50%, 75%, 90% of budget (FinOps guide)
- Rate limiting on user-facing endpoints (prevent abuse)
- Token budget per user/session
Security
- Prompt injection defenses in place
- System prompt not leakable via user input
- User input sanitized before passing to model
- MCP tools have least-privilege access
- No PII in logs (or redacted before logging)
- Red team testing completed
Quality
- Eval dataset of 20+ test cases created
- Baseline scores recorded for current prompt version
- Structured output validation on model responses
- Error handling for malformed model responses
- Timeout handling (what happens when the API is slow?)
Deployment
Infrastructure
- Hosting configured (Railway, Vercel, or self-hosted)
- Environment variables set in production
- HTTPS enabled
- CORS configured correctly
- Health check endpoint responding
Monitoring
- Logging for every LLM call (model, tokens, latency, cost)
- Observability dashboard set up (Helicone or similar)
- Error alerting configured (Slack, email, or Discord)
- Uptime monitoring on health endpoint
Rollback Plan
- Previous version tagged in git
- One-command rollback procedure documented
- Database migrations are reversible (if applicable)
- Feature flags for AI features (can disable without redeploy)
Post-deployment
First 24 hours
- Monitor error rates (should be <1%)
- Check cost dashboard (is spend within expected range?)
- Review sample of real user interactions
- Verify logging is capturing all fields
- Check latency (P50, P95, P99)
First week
- Run regression tests against production
- Review user feedback
- Check for prompt injection attempts in logs
- Verify cost projections match actual spend
- Document any issues for next deployment
The minimum viable checklist
If youβre moving fast and can only do 5 things:
- Pin your model version β prevents surprise behavior changes
- Set a spending limit β prevents bill shock
- Log every LLM call β youβll need this for debugging
- Add rate limiting β prevents abuse
- Have a rollback plan β one command to go back
Everything else can be added incrementally. See our governance guide for the full production framework.
Related: LLM Observability Β· What to Log in AI Systems Β· AI Security Checklist Β· How to Reduce LLM API Costs Β· Evaluate Ai Vendors Enterprise
π‘ Donβt skip monitoring: UptimeRobot monitors your endpoints every 5 minutes and alerts you via Slack, email, or webhook when something goes down. Free tier covers 50 monitors β enough for most projects. Set up UptimeRobot β