Railway is one of the simplest platforms for deploying AI applications. Push your code, set environment variables, get a URL. No Dockerfiles, no Kubernetes, no infrastructure management.
Here’s how to deploy an AI-powered FastAPI app from zero to production.
What you’ll deploy
A FastAPI app that calls an LLM API (Claude, GPT, or DeepSeek) and returns responses. This pattern covers chatbots, summarizers, code reviewers, and most AI features.
Step 1: Create the app
# main.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import httpx
import os
app = FastAPI()
class Query(BaseModel):
prompt: str
max_tokens: int = 500
@app.post("/chat")
async def chat(query: Query):
api_key = os.environ.get("ANTHROPIC_API_KEY")
if not api_key:
raise HTTPException(500, "API key not configured")
async with httpx.AsyncClient() as client:
response = await client.post(
"https://api.anthropic.com/v1/messages",
headers={
"x-api-key": api_key,
"anthropic-version": "2023-06-01",
"content-type": "application/json",
},
json={
"model": "claude-sonnet-4-5-20250514",
"max_tokens": query.max_tokens,
"messages": [{"role": "user", "content": query.prompt}],
},
timeout=30.0,
)
if response.status_code != 200:
raise HTTPException(response.status_code, "LLM API error")
data = response.json()
return {"response": data["content"][0]["text"]}
@app.get("/health")
async def health():
return {"status": "ok"}
# requirements.txt
fastapi
uvicorn[standard]
httpx
pydantic
Step 2: Deploy on Railway
- Push your code to GitHub
- Go to railway.app and sign in with GitHub
- Click “New Project” > “Deploy from GitHub repo”
- Select your repository
- Railway auto-detects Python and deploys
Step 3: Set environment variables
In the Railway dashboard, go to your service > Variables:
ANTHROPIC_API_KEY=sk-ant-...
PORT=8000
Railway automatically sets PORT but you can override it. Add your LLM API key here, never in code.
Step 4: Configure the start command
Railway usually auto-detects this, but if needed, set it in a Procfile:
web: uvicorn main:app --host 0.0.0.0 --port $PORT
Step 5: Add a custom domain
In Railway dashboard > Settings > Networking > Custom Domain. Point your DNS CNAME to the Railway-provided domain.
Step 6: Test it
curl -X POST https://your-app.railway.app/chat \
-H "Content-Type: application/json" \
-d '{"prompt": "Explain Docker in one sentence"}'
Cost
Railway’s Pro plan is $5/month + usage. A typical AI app serving 1,000 requests/day costs $5-15/month on Railway (excluding LLM API costs). See our cost optimization guide for managing the LLM spend.
Common issues and fixes
”No start command found”
Railway can’t detect how to start your app. Add a Procfile:
web: uvicorn main:app --host 0.0.0.0 --port $PORT
“Port already in use”
Always use $PORT from the environment, never hardcode:
import os
port = int(os.environ.get("PORT", 8000))
“Build failed: pip install error”
Pin your Python version with a runtime.txt:
python-3.11.9
“Request timeout”
LLM API calls can be slow. Increase your timeout and add streaming:
@app.post("/chat/stream")
async def chat_stream(query: Query):
async def generate():
async with httpx.AsyncClient() as client:
async with client.stream("POST",
"https://api.anthropic.com/v1/messages",
headers=headers,
json={**payload, "stream": True},
timeout=60.0,
) as response:
async for chunk in response.aiter_text():
yield chunk
return StreamingResponse(generate(), media_type="text/event-stream")
Adding a database
Railway makes adding Postgres trivial:
- In your project, click “New” > “Database” > “PostgreSQL”
- Railway auto-creates
DATABASE_URLenvironment variable - Use it in your app:
import os
DATABASE_URL = os.environ.get("DATABASE_URL")
This is useful for storing conversation history, user preferences, or caching LLM responses to reduce API costs.
Scaling
Railway auto-scales based on traffic. For AI apps, the bottleneck is usually the LLM API, not your server. But if you need more control:
- Horizontal scaling — Railway supports multiple replicas
- Region selection — deploy closer to your users or your LLM API provider
- Resource limits — set memory and CPU limits to control costs
Adding production features
Once deployed, add these incrementally:
- Rate limiting — use
slowapimiddleware to prevent abuse - Logging — log every LLM call with tokens, latency, cost
- Caching — cache identical prompts to reduce API calls
- Authentication — add API key auth for your endpoints
- Monitoring — connect Helicone as a proxy for automatic LLM observability
Alternatives
| Platform | Best for | Pricing |
|---|---|---|
| Railway | Simplest deploy, good for AI apps | $5/mo + usage |
| Vercel | Frontend + serverless functions | Free tier available |
| Render | Similar to Railway, free tier | Free tier available |
| Cloudways | Managed cloud hosting (AWS/GCP/DO) | From $14/mo |
| Self-hosted | Full control, cheapest at scale | VPS cost only |
For the full deployment checklist, see our AI app deployment checklist.
Related: AI App Deployment Checklist · How to Reduce LLM API Costs · Self-Hosted AI for Enterprise · LLM Observability