πŸ€– AI Tools
Β· 3 min read
Last updated on

How to Run DeepSeek Locally β€” V3 and R1 Setup Guide


DeepSeek V3 is a 671B parameter MoE model that matches GPT-4o on most benchmarks. DeepSeek R1 is a reasoning model comparable to OpenAI’s o1. Both are open-source and free to self-host. Here’s how to run them.

Update (April 24, 2026): DeepSeek V4-Flash (284B/13B active) is now the latest DeepSeek model. Open weights available on HuggingFace, MIT licensed. See how to run V4 locally for the updated setup guide.

Which DeepSeek model to run

ModelTotal paramsActive paramsVRAM needed (Q4)Best for
DeepSeek Coder V2 Lite236B14B~10-12GBCoding on consumer hardware
DeepSeek V3 (Q4)671B37B~80-100GBFull-quality general purpose
DeepSeek R1 (distilled 7B)7B7B~6GBReasoning on any hardware
DeepSeek R1 (distilled 32B)32B32B~20-24GBStrong reasoning
DeepSeek R1 (full)671B37B~80-100GBBest reasoning, needs serious hardware

For most developers: start with DeepSeek Coder V2 Lite (fits on a consumer GPU) or the R1 distilled 7B (runs on a laptop).

Run with Ollama

# DeepSeek Coder β€” runs on 12GB GPU
ollama run deepseek-coder-v2:16b

# DeepSeek R1 distilled β€” runs on any laptop
ollama run deepseek-r1:7b

# DeepSeek R1 32B β€” needs 24GB GPU
ollama run deepseek-r1:32b

# Full DeepSeek V3 β€” needs 80GB+ VRAM
ollama run deepseek-v3

Run with llama.cpp

# Download quantized DeepSeek Coder
huggingface-cli download deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct-GGUF \
  --include "*Q4_K_M*" \
  --local-dir ./models

# Start server
llama-server \
  --model ./models/DeepSeek-Coder-V2-Lite-Q4_K_M.gguf \
  --ctx-size 16384 \
  --threads 8 \
  --port 8080

Hardware reality check

The full DeepSeek V3 (671B) is not practical on consumer hardware. Even quantized to Q4, it needs ~80-100GB of VRAM. That means:

  • Mac Studio M3/M4 Ultra with 192GB unified memory β€” works
  • Multi-GPU setup (2-4x A100 80GB) β€” works
  • Single consumer GPU β€” doesn’t work

If you want to run the full V3 or R1 without buying multi-GPU hardware, cloud GPU providers offer multi-A100 instances on demand at a fraction of the purchase cost.

The distilled models are the practical choice:

  • R1 7B: any laptop with 8GB RAM
  • Coder V2 Lite: any GPU with 12GB VRAM
  • R1 32B: RTX 4090 or 32GB Mac

DeepSeek R1: reasoning on a laptop

The R1 distilled models are the hidden gem. The 7B version runs on basically anything and gives you o1-style chain-of-thought reasoning for free:

ollama run deepseek-r1:7b

It’s not as strong as the full R1, but for math problems, logic puzzles, and step-by-step reasoning, it’s remarkably capable for a model that fits in 6GB of RAM.

Connect to your IDE

# Start DeepSeek Coder locally
ollama run deepseek-coder-v2:16b

# In VS Code with Continue:
# Provider: Ollama
# Model: deepseek-coder-v2:16b

DeepSeek Coder V2 Lite is a strong free alternative to GitHub Copilot for code completion. It supports 338 programming languages and runs entirely on your machine.

DeepSeek vs Qwen for self-hosting

DeepSeekQwen 3.5
Best small modelR1 7B (reasoning)9B (general, beats 120B models)
Best coding modelCoder V2 Lite 14BQwen 2.5 Coder 32B
Full model VRAM~80-100GB~214GB (397B)
LicenseMITApache 2.0
MultimodalNoYes

For coding: DeepSeek Coder V2 Lite if you have 12GB VRAM, Qwen 2.5 Coder 32B if you have 24GB. For reasoning: DeepSeek R1 distilled models are unique β€” no other open-source model offers dedicated chain-of-thought reasoning at this size. For general use: Qwen 3.5-9B is the better all-rounder.

Related: Self-Hosted AI for Enterprise Β· Run DeepSeek Locally