π» Run AI Locally
104 articles. Run AI models on your own hardware. Ollama, vLLM, llama.cpp guides. Hardware requirements and self-hosted tutorials.
π¦ Ollama (36)
Run Ollama in Docker with GPU passthrough. Perfect for teams, servers, and reproducible AI environme
Run AI on a Raspberry Pi β Yes, It Actually Works (2026)Run small LLMs on a Raspberry Pi 5 with 8GB RAM. Ollama setup, best models that fit, and what you ca
Build a Local AI Chatbot for Your Docs (RAG With Ollama)Step-by-step tutorial: build a chatbot that answers questions about your documentation using Ollama,
π How to Run Locally (23)
Run Ling Flash (104B/7.4B active) locally. Hardware requirements, HuggingFace download, vLLM setup,
How to Run Poolside Laguna XS.2 Locally β Setup Guide (2026)Run Laguna XS.2 (33B/3B active) locally. Hardware requirements, HuggingFace download, vLLM setup, an
How to Run Mistral Large 2 Locally β Setup Guide (2026)Step-by-step guide to running Mistral Large 2 (123B) locally with vLLM, Ollama, and llama.cpp. Hardw
π₯οΈ Hardware & VRAM (12)
The best AI models that fit in 16GB of VRAM or less. Covers coding, general chat, and reasoning mode
How Much VRAM Do You Need for AI Models? (2026 Calculator)Calculate exactly how much GPU VRAM you need for any AI model. Formula, examples for popular models,
When to Use CPU vs GPU for LLM InferenceGPU isn't always the answer. Here's when CPU inference makes sense: small models, low volume, edge d
β‘ Inference Engines (2)
SGLang beats vLLM by 29% on shared-context workloads. How it works, when to use it, and whether you
How to Serve LLMs with vLLM β Production Deployment GuideStep-by-step guide to deploying LLMs with vLLM. OpenAI-compatible API, tensor parallelism, quantizat
π Self-Hosted & Privacy (31)
You downloaded a model and see GGUF, GPTQ, AWQ, EXL2. What do they mean? Which one to pick? A plain-
AI Dev Weekly #9: Gemini 3.2 Flash Leaks Before I/O, GPT-5.5 Instant Becomes Default, and Enterprise Agents Go Self-HostedThis week: Google's unreleased Gemini 3.2 Flash outperforms 3.1 Pro on coding at $0.25/M tokens, Ope
Best AI Autocomplete Models in 2026 β Tab Completion RankedThe best models for inline code autocomplete: Codestral, Qwen Coder, DeepSeek Coder, and more. Bench