Apr 3, 2026 · 3 min read

How to Run DeepSeek Locally — V3 and R1 Setup Guide

DeepSeek V3 is a 671B parameter MoE model that matches GPT-4o on most benchmarks. DeepSeek R1 is a reasoning model comparable to OpenAI’s o1. Both are open-source and free to self-host. Here’s how to run them.

Which DeepSeek model to run

Model	Total params	Active params	VRAM needed (Q4)	Best for
DeepSeek Coder V2 Lite	236B	14B	~10-12GB	Coding on consumer hardware
DeepSeek V3 (Q4)	671B	37B	~80-100GB	Full-quality general purpose
DeepSeek R1 (distilled 7B)	7B	7B	~6GB	Reasoning on any hardware
DeepSeek R1 (distilled 32B)	32B	32B	~20-24GB	Strong reasoning
DeepSeek R1 (full)	671B	37B	~80-100GB	Best reasoning, needs serious hardware

For most developers: start with DeepSeek Coder V2 Lite (fits on a consumer GPU) or the R1 distilled 7B (runs on a laptop).

Run with Ollama

# DeepSeek Coder — runs on 12GB GPU
ollama run deepseek-coder-v2:16b

# DeepSeek R1 distilled — runs on any laptop
ollama run deepseek-r1:7b

# DeepSeek R1 32B — needs 24GB GPU
ollama run deepseek-r1:32b

# Full DeepSeek V3 — needs 80GB+ VRAM
ollama run deepseek-v3

Run with llama.cpp

# Download quantized DeepSeek Coder
huggingface-cli download deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct-GGUF \
  --include "*Q4_K_M*" \
  --local-dir ./models

# Start server
llama-server \
  --model ./models/DeepSeek-Coder-V2-Lite-Q4_K_M.gguf \
  --ctx-size 16384 \
  --threads 8 \
  --port 8080

Hardware reality check

The full DeepSeek V3 (671B) is not practical on consumer hardware. Even quantized to Q4, it needs ~80-100GB of VRAM. That means:

Mac Studio M3/M4 Ultra with 192GB unified memory — works
Multi-GPU setup (2-4x A100 80GB) — works
Single consumer GPU — doesn’t work

The distilled models are the practical choice:

R1 7B: any laptop with 8GB RAM
Coder V2 Lite: any GPU with 12GB VRAM
R1 32B: RTX 4090 or 32GB Mac

DeepSeek R1: reasoning on a laptop

The R1 distilled models are the hidden gem. The 7B version runs on basically anything and gives you o1-style chain-of-thought reasoning for free:

ollama run deepseek-r1:7b

It’s not as strong as the full R1, but for math problems, logic puzzles, and step-by-step reasoning, it’s remarkably capable for a model that fits in 6GB of RAM.

Connect to your IDE

# Start DeepSeek Coder locally
ollama run deepseek-coder-v2:16b

# In VS Code with Continue:
# Provider: Ollama
# Model: deepseek-coder-v2:16b

DeepSeek Coder V2 Lite is a strong free alternative to GitHub Copilot for code completion. It supports 338 programming languages and runs entirely on your machine.

DeepSeek vs Qwen for self-hosting

	DeepSeek	Qwen 3.5
Best small model	R1 7B (reasoning)	9B (general, beats 120B models)
Best coding model	Coder V2 Lite 14B	Qwen 2.5 Coder 32B
Full model VRAM	~80-100GB	~214GB (397B)
License	MIT	Apache 2.0
Multimodal	No	Yes

For coding: DeepSeek Coder V2 Lite if you have 12GB VRAM, Qwen 2.5 Coder 32B if you have 24GB. For reasoning: DeepSeek R1 distilled models are unique — no other open-source model offers dedicated chain-of-thought reasoning at this size. For general use: Qwen 3.5-9B is the better all-rounder.

How to Run DeepSeek Locally — V3 and R1 Setup Guide

Which DeepSeek model to run

Run with Ollama

Run with llama.cpp

Hardware reality check

DeepSeek R1: reasoning on a laptop

Connect to your IDE

DeepSeek vs Qwen for self-hosting

Related

📬 Get weekly dev tools & AI tips

You might also like

How to Run GLM-5.1 Locally — Hardware, Setup, and Quantization Guide

How to Replace GitHub Copilot for Free — Step-by-Step Guide (2026)

How to Run AI Without a GPU — CPU-Only Inference Guide (2026)

How to Run Llama 4 Locally — Scout and Maverick Setup Guide