💻 Run AI Locally

104 articles. Run AI models on your own hardware. Ollama, vLLM, llama.cpp guides. Hardware requirements and self-hosted tutorials.

🦙 Ollama (36)

Run Ollama in Docker with GPU passthrough. Perfect for teams, servers, and reproducible AI environme

Run small LLMs on a Raspberry Pi 5 with 8GB RAM. Ollama setup, best models that fit, and what you ca

Step-by-step tutorial: build a chatbot that answers questions about your documentation using Ollama,

Run Ling Flash (104B/7.4B active) locally. Hardware requirements, HuggingFace download, vLLM setup,

Run Laguna XS.2 (33B/3B active) locally. Hardware requirements, HuggingFace download, vLLM setup, an

Step-by-step guide to running Mistral Large 2 (123B) locally with vLLM, Ollama, and llama.cpp. Hardw

The best AI models that fit in 16GB of VRAM or less. Covers coding, general chat, and reasoning mode

Calculate exactly how much GPU VRAM you need for any AI model. Formula, examples for popular models,

GPU isn't always the answer. Here's when CPU inference makes sense: small models, low volume, edge d

SGLang beats vLLM by 29% on shared-context workloads. How it works, when to use it, and whether you

Step-by-step guide to deploying LLMs with vLLM. OpenAI-compatible API, tensor parallelism, quantizat

You downloaded a model and see GGUF, GPTQ, AWQ, EXL2. What do they mean? Which one to pick? A plain-

This week: Google's unreleased Gemini 3.2 Flash outperforms 3.1 Pro on coding at $0.25/M tokens, Ope

The best models for inline code autocomplete: Codestral, Qwen Coder, DeepSeek Coder, and more. Bench