Your AI app works on localhost. Now you need to ship it. AI deployments have more failure modes than traditional software: costs scale with usage, model behavior changes unpredictably, and quality degrades silently.
This checklist catches the issues before your users do.
Pre-deployment
API & Model Configuration
- API keys stored in environment variables, not in code
- API keys scoped to minimum required permissions
- Model version pinned (not “latest”) to prevent surprise behavior changes
- Fallback model configured for when primary is down (e.g., DeepSeek as fallback for Claude)
-
max_tokensset on every request to prevent runaway generation - Temperature set explicitly (don’t rely on defaults)
Cost Protection
- Monthly spending limit set with provider (monitor guide)
- Per-request cost estimate logged (what to log)
- Alert at 50%, 75%, 90% of budget (FinOps guide)
- Rate limiting on user-facing endpoints (prevent abuse)
- Token budget per user/session
Security
- Prompt injection defenses in place
- System prompt not leakable via user input
- User input sanitized before passing to model
- MCP tools have least-privilege access
- No PII in logs (or redacted before logging)
- Red team testing completed
Quality
- Eval dataset of 20+ test cases created
- Baseline scores recorded for current prompt version
- Structured output validation on model responses
- Error handling for malformed model responses
- Timeout handling (what happens when the API is slow?)
Deployment
Infrastructure
- Hosting configured (Railway, Vercel, or self-hosted)
- Environment variables set in production
- HTTPS enabled
- CORS configured correctly
- Health check endpoint responding
Monitoring
- Logging for every LLM call (model, tokens, latency, cost)
- Observability dashboard set up (Helicone or similar)
- Error alerting configured (Slack, email, or Discord)
- Uptime monitoring on health endpoint
Rollback Plan
- Previous version tagged in git
- One-command rollback procedure documented
- Database migrations are reversible (if applicable)
- Feature flags for AI features (can disable without redeploy)
Post-deployment
First 24 hours
- Monitor error rates (should be <1%)
- Check cost dashboard (is spend within expected range?)
- Review sample of real user interactions
- Verify logging is capturing all fields
- Check latency (P50, P95, P99)
First week
- Run regression tests against production
- Review user feedback
- Check for prompt injection attempts in logs
- Verify cost projections match actual spend
- Document any issues for next deployment
The minimum viable checklist
If you’re moving fast and can only do 5 things:
- Pin your model version — prevents surprise behavior changes
- Set a spending limit — prevents bill shock
- Log every LLM call — you’ll need this for debugging
- Add rate limiting — prevents abuse
- Have a rollback plan — one command to go back
Everything else can be added incrementally. See our governance guide for the full production framework.
Related: LLM Observability · What to Log in AI Systems · AI Security Checklist · How to Reduce LLM API Costs · Evaluate Ai Vendors Enterprise