🤖 AI Tools
· 3 min read

How to Run DeepSeek Locally — V3 and R1 Setup Guide


DeepSeek V3 is a 671B parameter MoE model that matches GPT-4o on most benchmarks. DeepSeek R1 is a reasoning model comparable to OpenAI’s o1. Both are open-source and free to self-host. Here’s how to run them.

Which DeepSeek model to run

ModelTotal paramsActive paramsVRAM needed (Q4)Best for
DeepSeek Coder V2 Lite236B14B~10-12GBCoding on consumer hardware
DeepSeek V3 (Q4)671B37B~80-100GBFull-quality general purpose
DeepSeek R1 (distilled 7B)7B7B~6GBReasoning on any hardware
DeepSeek R1 (distilled 32B)32B32B~20-24GBStrong reasoning
DeepSeek R1 (full)671B37B~80-100GBBest reasoning, needs serious hardware

For most developers: start with DeepSeek Coder V2 Lite (fits on a consumer GPU) or the R1 distilled 7B (runs on a laptop).

Run with Ollama

# DeepSeek Coder — runs on 12GB GPU
ollama run deepseek-coder-v2:16b

# DeepSeek R1 distilled — runs on any laptop
ollama run deepseek-r1:7b

# DeepSeek R1 32B — needs 24GB GPU
ollama run deepseek-r1:32b

# Full DeepSeek V3 — needs 80GB+ VRAM
ollama run deepseek-v3

Run with llama.cpp

# Download quantized DeepSeek Coder
huggingface-cli download deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct-GGUF \
  --include "*Q4_K_M*" \
  --local-dir ./models

# Start server
llama-server \
  --model ./models/DeepSeek-Coder-V2-Lite-Q4_K_M.gguf \
  --ctx-size 16384 \
  --threads 8 \
  --port 8080

Hardware reality check

The full DeepSeek V3 (671B) is not practical on consumer hardware. Even quantized to Q4, it needs ~80-100GB of VRAM. That means:

  • Mac Studio M3/M4 Ultra with 192GB unified memory — works
  • Multi-GPU setup (2-4x A100 80GB) — works
  • Single consumer GPU — doesn’t work

The distilled models are the practical choice:

  • R1 7B: any laptop with 8GB RAM
  • Coder V2 Lite: any GPU with 12GB VRAM
  • R1 32B: RTX 4090 or 32GB Mac

DeepSeek R1: reasoning on a laptop

The R1 distilled models are the hidden gem. The 7B version runs on basically anything and gives you o1-style chain-of-thought reasoning for free:

ollama run deepseek-r1:7b

It’s not as strong as the full R1, but for math problems, logic puzzles, and step-by-step reasoning, it’s remarkably capable for a model that fits in 6GB of RAM.

Connect to your IDE

# Start DeepSeek Coder locally
ollama run deepseek-coder-v2:16b

# In VS Code with Continue:
# Provider: Ollama
# Model: deepseek-coder-v2:16b

DeepSeek Coder V2 Lite is a strong free alternative to GitHub Copilot for code completion. It supports 338 programming languages and runs entirely on your machine.

DeepSeek vs Qwen for self-hosting

DeepSeekQwen 3.5
Best small modelR1 7B (reasoning)9B (general, beats 120B models)
Best coding modelCoder V2 Lite 14BQwen 2.5 Coder 32B
Full model VRAM~80-100GB~214GB (397B)
LicenseMITApache 2.0
MultimodalNoYes

For coding: DeepSeek Coder V2 Lite if you have 12GB VRAM, Qwen 2.5 Coder 32B if you have 24GB. For reasoning: DeepSeek R1 distilled models are unique — no other open-source model offers dedicated chain-of-thought reasoning at this size. For general use: Qwen 3.5-9B is the better all-rounder.