πŸ’» Run AI Locally

104 articles. Run AI models on your own hardware. Ollama, vLLM, llama.cpp guides. Hardware requirements and self-hosted tutorials.

πŸ¦™ Ollama (36)

Ollama Docker Setup Guide β€” Run Local LLMs in Containers (2026)

Run Ollama in Docker with GPU passthrough. Perfect for teams, servers, and reproducible AI environme

Run AI on a Raspberry Pi β€” Yes, It Actually Works (2026)

Run small LLMs on a Raspberry Pi 5 with 8GB RAM. Ollama setup, best models that fit, and what you ca

Build a Local AI Chatbot for Your Docs (RAG With Ollama)

Step-by-step tutorial: build a chatbot that answers questions about your documentation using Ollama,

πŸ“– How to Run Locally (23)

How to Run InclusionAI Ling Flash Locally β€” The 7.4B Active Coding Model (2026)

Run Ling Flash (104B/7.4B active) locally. Hardware requirements, HuggingFace download, vLLM setup,

How to Run Poolside Laguna XS.2 Locally β€” Setup Guide (2026)

Run Laguna XS.2 (33B/3B active) locally. Hardware requirements, HuggingFace download, vLLM setup, an

How to Run Mistral Large 2 Locally β€” Setup Guide (2026)

Step-by-step guide to running Mistral Large 2 (123B) locally with vLLM, Ollama, and llama.cpp. Hardw

πŸ–₯️ Hardware & VRAM (12)

Best AI Models Under 16GB VRAM β€” What You Can Actually Run (2026)

The best AI models that fit in 16GB of VRAM or less. Covers coding, general chat, and reasoning mode

How Much VRAM Do You Need for AI Models? (2026 Calculator)

Calculate exactly how much GPU VRAM you need for any AI model. Formula, examples for popular models,

When to Use CPU vs GPU for LLM Inference

GPU isn't always the answer. Here's when CPU inference makes sense: small models, low volume, edge d

⚑ Inference Engines (2)

🏠 Self-Hosted & Privacy (31)

GGUF vs GPTQ vs AWQ β€” LLM Quantization Formats Explained (2026)

You downloaded a model and see GGUF, GPTQ, AWQ, EXL2. What do they mean? Which one to pick? A plain-

AI Dev Weekly #9: Gemini 3.2 Flash Leaks Before I/O, GPT-5.5 Instant Becomes Default, and Enterprise Agents Go Self-Hosted

This week: Google's unreleased Gemini 3.2 Flash outperforms 3.1 Pro on coding at $0.25/M tokens, Ope

Best AI Autocomplete Models in 2026 β€” Tab Completion Ranked

The best models for inline code autocomplete: Codestral, Qwen Coder, DeepSeek Coder, and more. Bench