DeepSeek V3 is a 671B parameter MoE model that matches GPT-4o on most benchmarks. DeepSeek R1 is a reasoning model comparable to OpenAI’s o1. Both are open-source and free to self-host. Here’s how to run them.
Which DeepSeek model to run
| Model | Total params | Active params | VRAM needed (Q4) | Best for |
|---|---|---|---|---|
| DeepSeek Coder V2 Lite | 236B | 14B | ~10-12GB | Coding on consumer hardware |
| DeepSeek V3 (Q4) | 671B | 37B | ~80-100GB | Full-quality general purpose |
| DeepSeek R1 (distilled 7B) | 7B | 7B | ~6GB | Reasoning on any hardware |
| DeepSeek R1 (distilled 32B) | 32B | 32B | ~20-24GB | Strong reasoning |
| DeepSeek R1 (full) | 671B | 37B | ~80-100GB | Best reasoning, needs serious hardware |
For most developers: start with DeepSeek Coder V2 Lite (fits on a consumer GPU) or the R1 distilled 7B (runs on a laptop).
Run with Ollama
# DeepSeek Coder — runs on 12GB GPU
ollama run deepseek-coder-v2:16b
# DeepSeek R1 distilled — runs on any laptop
ollama run deepseek-r1:7b
# DeepSeek R1 32B — needs 24GB GPU
ollama run deepseek-r1:32b
# Full DeepSeek V3 — needs 80GB+ VRAM
ollama run deepseek-v3
Run with llama.cpp
# Download quantized DeepSeek Coder
huggingface-cli download deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct-GGUF \
--include "*Q4_K_M*" \
--local-dir ./models
# Start server
llama-server \
--model ./models/DeepSeek-Coder-V2-Lite-Q4_K_M.gguf \
--ctx-size 16384 \
--threads 8 \
--port 8080
Hardware reality check
The full DeepSeek V3 (671B) is not practical on consumer hardware. Even quantized to Q4, it needs ~80-100GB of VRAM. That means:
- Mac Studio M3/M4 Ultra with 192GB unified memory — works
- Multi-GPU setup (2-4x A100 80GB) — works
- Single consumer GPU — doesn’t work
The distilled models are the practical choice:
- R1 7B: any laptop with 8GB RAM
- Coder V2 Lite: any GPU with 12GB VRAM
- R1 32B: RTX 4090 or 32GB Mac
DeepSeek R1: reasoning on a laptop
The R1 distilled models are the hidden gem. The 7B version runs on basically anything and gives you o1-style chain-of-thought reasoning for free:
ollama run deepseek-r1:7b
It’s not as strong as the full R1, but for math problems, logic puzzles, and step-by-step reasoning, it’s remarkably capable for a model that fits in 6GB of RAM.
Connect to your IDE
# Start DeepSeek Coder locally
ollama run deepseek-coder-v2:16b
# In VS Code with Continue:
# Provider: Ollama
# Model: deepseek-coder-v2:16b
DeepSeek Coder V2 Lite is a strong free alternative to GitHub Copilot for code completion. It supports 338 programming languages and runs entirely on your machine.
DeepSeek vs Qwen for self-hosting
| DeepSeek | Qwen 3.5 | |
|---|---|---|
| Best small model | R1 7B (reasoning) | 9B (general, beats 120B models) |
| Best coding model | Coder V2 Lite 14B | Qwen 2.5 Coder 32B |
| Full model VRAM | ~80-100GB | ~214GB (397B) |
| License | MIT | Apache 2.0 |
| Multimodal | No | Yes |
For coding: DeepSeek Coder V2 Lite if you have 12GB VRAM, Qwen 2.5 Coder 32B if you have 24GB. For reasoning: DeepSeek R1 distilled models are unique — no other open-source model offers dedicated chain-of-thought reasoning at this size. For general use: Qwen 3.5-9B is the better all-rounder.