πŸ“‹ Cheat Sheets
Β· 2 min read

Ollama Cheat Sheet: Every Command You Need (2026)


Some links in this article are affiliate links. We earn a commission at no extra cost to you when you purchase through them. Full disclosure.

Quick reference for Ollama β€” the easiest way to run AI models locally.

Install

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Docker
docker run -d -v ollama:/root/.ollama -p 11434:11434 ollama/ollama

Model management

ollama pull qwen3:8b          # Download a model
ollama run qwen3:8b           # Run interactively (auto-pulls if needed)
ollama list                    # List downloaded models
ollama show qwen3:8b          # Show model details
ollama rm qwen3:8b            # Delete a model
ollama cp qwen3:8b my-model   # Copy/rename a model
ollama ps                     # Show running models
ollama stop qwen3:8b          # Stop a running model

Run options

ollama run qwen3:8b                          # Interactive chat
ollama run qwen3:8b "one-shot prompt"        # Single response, then exit
ollama run qwen3:8b --verbose                # Show token stats
ollama run qwen3:8b --num-ctx 2048           # Set context window
ollama run qwen3:8b --num-gpu 999            # Use all GPU layers
ollama run qwen3:8b --format json            # Force JSON output

Create custom models

# Create a Modelfile
cat > Modelfile << 'EOF'
FROM qwen3:8b
PARAMETER temperature 0.2
PARAMETER num_ctx 4096
SYSTEM You are a senior Python developer. Write clean, typed, tested code.
EOF

# Build the model
ollama create python-coder -f Modelfile

# Run it
ollama run python-coder

Import GGUF files

cat > Modelfile << 'EOF'
FROM ./model-file.gguf
EOF
ollama create my-model -f Modelfile

API endpoints

# Generate (streaming)
curl http://localhost:11434/api/generate -d '{
  "model": "qwen3:8b",
  "prompt": "Write hello world in Python"
}'

# Chat (multi-turn)
curl http://localhost:11434/api/chat -d '{
  "model": "qwen3:8b",
  "messages": [{"role": "user", "content": "Explain REST APIs"}]
}'

# Embeddings
curl http://localhost:11434/api/embed -d '{
  "model": "nomic-embed-text",
  "input": "Your text here"
}'

# List models
curl http://localhost:11434/api/tags

# Model info
curl http://localhost:11434/api/show -d '{"name": "qwen3:8b"}'

Environment variables

OLLAMA_HOST=0.0.0.0:11434    # Listen address (default: 127.0.0.1:11434)
OLLAMA_MODELS=/path/to/models # Model storage location
OLLAMA_NUM_GPU=999            # GPU layers (999 = all)
OLLAMA_NUM_PARALLEL=4         # Concurrent requests
OLLAMA_MAX_LOADED_MODELS=2    # Models in memory simultaneously
OLLAMA_FLASH_ATTENTION=1      # Enable flash attention
OLLAMA_KEEP_ALIVE=5m          # Keep model loaded after last request
OLLAMA_DEBUG=1                # Debug logging

Best models for coding

ModelSizeCommandBest for
Qwen3 8B5 GBollama run qwen3:8bGeneral coding
DeepSeek R1 14B9 GBollama run deepseek-r1:14bReasoning
Qwen 3.5 27B16 GBollama run qwen3.5:27bBest quality
CodeStral13 GBollama run codestralCode-specific
Phi-4 3.8B2.5 GBollama run phi4Low RAM

See our best Ollama models for coding for the full list.

Use with AI coding tools

# With Aider
aider --model ollama/qwen3:8b

# With Continue.dev (VS Code)
# Add to ~/.continue/config.json:
# {"models": [{"provider": "ollama", "model": "qwen3:8b"}]}

# With OpenCode
opencode --model ollama/qwen3:8b

Troubleshooting

ErrorFix
Out of memoryUse smaller model or quantization
Model not foundCheck name format
Connection refusedStart service
Slow responsesEnable GPU, reduce context

Full troubleshooting: Ollama Troubleshooting Guide

Speed up your workflow β€” Raycast lets you trigger Ollama commands from a keyboard shortcut.

Related: Ollama Complete Guide Β· Best Ollama Models for Coding Β· Ollama vs LM Studio vs vLLM Β· How Much VRAM for AI Models Β· Aider with Ollama Setup