Before you commit to an AI provider, estimate what you’ll actually spend. Here’s how to calculate it.
The formula
Monthly cost = (input_tokens + output_tokens) × price_per_token × requests_per_day × 30
Token estimation rules of thumb
| Content | Approximate tokens |
|---|---|
| 1 word | ~1.3 tokens |
| 1 line of code | ~10 tokens |
| 1 page of text | ~500 tokens |
| Average chat message | ~50-100 tokens |
| System prompt | ~500-2,000 tokens |
| Full source file | ~500-5,000 tokens |
| LLM response (coding) | ~200-1,000 tokens |
Example: AI coding assistant
A developer using an AI coding tool for 4 hours/day:
| Activity | Requests/day | Input tokens/req | Output tokens/req |
|---|---|---|---|
| Autocomplete | 200 | 500 | 100 |
| Chat questions | 30 | 2,000 | 500 |
| Code generation | 10 | 3,000 | 1,000 |
| Refactoring | 5 | 5,000 | 2,000 |
Total daily tokens: ~250K input, ~50K output
| Provider | Input rate | Output rate | Daily cost | Monthly cost |
|---|---|---|---|---|
| Claude Opus 4.6 | $15/1M | $75/1M | $7.50 | $225 |
| GPT-5.4 | $10/1M | $30/1M | $4.00 | $120 |
| DeepSeek Chat | $0.27/1M | $1.10/1M | $0.12 | $3.60 |
| Qwen 3.5 Flash | $0.065/1M | $0.26/1M | $0.03 | $0.90 |
| Local (Ollama) | Free | Free | $0 | $0 |
The difference between Claude Opus and DeepSeek for the same workload: $225 vs $3.60/month. That’s 62x.
Example: RAG application
A search app handling 10,000 queries/day:
| Component | Cost per query | Daily | Monthly |
|---|---|---|---|
| Embeddings | $0.0001 | $1 | $30 |
| Vector DB (Pinecone) | $0.0003 | $3 | $90 |
| LLM generation (DeepSeek) | $0.001 | $10 | $300 |
| Web search API | $0.025 | $250 | $7,500 |
| Total | $0.026 | $264 | $7,920 |
The web search API dominates cost. If you can use a local knowledge base instead, costs drop to ~$420/month.
Cost reduction strategies
Apply these in order of effort:
- Model routing — cheap model for simple tasks (40-60% savings)
- Prompt caching — reuse system prompts (10-30% savings)
- Token optimization — shorter prompts, structured output (15-25% savings)
- Self-hosting — local models for predictable workloads (50-90% savings)
Budget planning template
| Category | Low usage | Medium usage | Heavy usage |
|---|---|---|---|
| AI coding tool | $0-20/mo | $20-50/mo | $50-200/mo |
| RAG/search | $30-100/mo | $100-500/mo | $500-5,000/mo |
| Content generation | $5-20/mo | $20-100/mo | $100-1,000/mo |
| Embeddings | $5-30/mo | $30-100/mo | $100-500/mo |
FAQ
How do I calculate LLM costs?
Multiply your total input and output tokens by the provider’s per-token price, then multiply by your daily request volume and 30 days. The formula is: (input_tokens + output_tokens) × price_per_token × requests_per_day × 30.
Which AI model is cheapest?
Self-hosted local models (via Ollama) are free after hardware costs. For API usage, Qwen 3.5 Flash at $0.065/1M input tokens and DeepSeek Chat at $0.27/1M input tokens are the cheapest paid options — 60–200x cheaper than premium models like Claude Opus.
How many tokens is 1000 words?
Approximately 1,300 tokens. The general rule is 1 word ≈ 1.3 tokens, though this varies slightly by language and content type (code tends to use more tokens per line than prose).
Can I reduce LLM costs?
Yes — the most effective strategies in order are: model routing (use cheap models for simple tasks, 40–60% savings), prompt caching (reuse system prompts, 10–30% savings), token optimization (shorter prompts, 15–25% savings), and self-hosting for predictable workloads (50–90% savings).
Related: How to Reduce LLM API Costs · AI Coding Tools Pricing 2026 · Best Free AI APIs 2026 · Prompt Caching Explained