Apr 25, 2026 · 2 min read

Ollama Cheat Sheet: Every Command You Need (2026)

Some links in this article are affiliate links. We earn a commission at no extra cost to you when you purchase through them. Full disclosure.

Quick reference for Ollama — the easiest way to run AI models locally.

Install

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Docker
docker run -d -v ollama:/root/.ollama -p 11434:11434 ollama/ollama

Model management

ollama pull qwen3:8b          # Download a model
ollama run qwen3:8b           # Run interactively (auto-pulls if needed)
ollama list                    # List downloaded models
ollama show qwen3:8b          # Show model details
ollama rm qwen3:8b            # Delete a model
ollama cp qwen3:8b my-model   # Copy/rename a model
ollama ps                     # Show running models
ollama stop qwen3:8b          # Stop a running model

Run options

ollama run qwen3:8b                          # Interactive chat
ollama run qwen3:8b "one-shot prompt"        # Single response, then exit
ollama run qwen3:8b --verbose                # Show token stats
ollama run qwen3:8b --num-ctx 2048           # Set context window
ollama run qwen3:8b --num-gpu 999            # Use all GPU layers
ollama run qwen3:8b --format json            # Force JSON output

Create custom models

# Create a Modelfile
cat > Modelfile << 'EOF'
FROM qwen3:8b
PARAMETER temperature 0.2
PARAMETER num_ctx 4096
SYSTEM You are a senior Python developer. Write clean, typed, tested code.
EOF

# Build the model
ollama create python-coder -f Modelfile

# Run it
ollama run python-coder

Import GGUF files

cat > Modelfile << 'EOF'
FROM ./model-file.gguf
EOF
ollama create my-model -f Modelfile

API endpoints

# Generate (streaming)
curl http://localhost:11434/api/generate -d '{
  "model": "qwen3:8b",
  "prompt": "Write hello world in Python"
}'

# Chat (multi-turn)
curl http://localhost:11434/api/chat -d '{
  "model": "qwen3:8b",
  "messages": [{"role": "user", "content": "Explain REST APIs"}]
}'

# Embeddings
curl http://localhost:11434/api/embed -d '{
  "model": "nomic-embed-text",
  "input": "Your text here"
}'

# List models
curl http://localhost:11434/api/tags

# Model info
curl http://localhost:11434/api/show -d '{"name": "qwen3:8b"}'

Environment variables

OLLAMA_HOST=0.0.0.0:11434    # Listen address (default: 127.0.0.1:11434)
OLLAMA_MODELS=/path/to/models # Model storage location
OLLAMA_NUM_GPU=999            # GPU layers (999 = all)
OLLAMA_NUM_PARALLEL=4         # Concurrent requests
OLLAMA_MAX_LOADED_MODELS=2    # Models in memory simultaneously
OLLAMA_FLASH_ATTENTION=1      # Enable flash attention
OLLAMA_KEEP_ALIVE=5m          # Keep model loaded after last request
OLLAMA_DEBUG=1                # Debug logging

Best models for coding

Model	Size	Command	Best for
Qwen3 8B	5 GB	`ollama run qwen3:8b`	General coding
DeepSeek R1 14B	9 GB	`ollama run deepseek-r1:14b`	Reasoning
Qwen 3.5 27B	16 GB	`ollama run qwen3.5:27b`	Best quality
CodeStral	13 GB	`ollama run codestral`	Code-specific
Phi-4 3.8B	2.5 GB	`ollama run phi4`	Low RAM

See our best Ollama models for coding for the full list.

Use with AI coding tools

# With Aider
aider --model ollama/qwen3:8b

# With Continue.dev (VS Code)
# Add to ~/.continue/config.json:
# {"models": [{"provider": "ollama", "model": "qwen3:8b"}]}

# With OpenCode
opencode --model ollama/qwen3:8b

Troubleshooting

Error	Fix
Out of memory	Use smaller model or quantization
Model not found	Check name format
Connection refused	Start service
Slow responses	Enable GPU, reduce context

Full troubleshooting: Ollama Troubleshooting Guide

Speed up your workflow — Raycast lets you trigger Ollama commands from a keyboard shortcut.

Ollama Cheat Sheet: Every Command You Need (2026)

Install

Model management

Run options

Create custom models

Import GGUF files

API endpoints

Environment variables

Best models for coding

Use with AI coding tools

Troubleshooting

📬 AI Dev Weekly

You might also like

Ansible Cheat Sheet — Playbooks, Modules, and Common Tasks

Astro Cheat Sheet — Components, Content Collections, and Routing

Cargo Cheat Sheet — Build, Test, and Manage Rust Projects

C# Cheat Sheet — Syntax, LINQ, Async, and Common Patterns