Apr 11, 2026 · 3 min read

AI Glossary for Developers — Every Term You Need to Know (2026)

AI terminology explained in plain English, for developers who build things.

Models & Architecture

LLM (Large Language Model) — An AI model trained on text that can generate text. Examples: Claude, GPT-5, Gemma 4.

MoE (Mixture of Experts) — Architecture where only a subset of parameters activate per token. GLM-5.1 has 754B total but only 40B active. Makes large models efficient.

Dense model — All parameters activate for every token. Mistral Large 2 (123B) is dense. Simpler but needs more compute per token.

Parameters — The learned values in a model. More parameters generally = more capable. Measured in billions (B).

Open-weight / Open-source — Model weights are publicly available. You can download and run them locally. Examples: Qwen 3.5, Kimi K2.5, DeepSeek.

Frontier model — The most capable models available. Currently: Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro.

Tokens & Context

Token — The unit models process. Roughly 1 token ≈ 0.75 words or ~4 characters. “Hello world” = 2 tokens.

Context window — Maximum tokens a model can process in one request. Ranges from 4K to 1M+ tokens. Larger = can read more code at once.

Input tokens — What you send to the model (your prompt, code, context).

Output tokens — What the model generates (the response). Usually more expensive than input.

Techniques

RAG (Retrieval-Augmented Generation) — Fetching relevant documents and including them in the prompt so the model can answer questions about your data.

Fine-tuning — Retraining a model on your specific data to change its behavior or knowledge. More expensive than RAG.

Embeddings — Converting text into number arrays that capture meaning. Used for semantic search and RAG.

Prompt engineering — Writing instructions that get the best output from AI models.

Prompt caching — Reusing the beginning of prompts across requests to reduce costs by up to 90%.

FIM (Fill-in-the-Middle) — Model predicts code between a prefix and suffix. How Codestral does autocomplete.

Chain-of-thought — Asking the model to reason step by step before answering. Improves accuracy on complex tasks.

Infrastructure

Vector database — Database that stores embeddings and finds similar ones. Examples: Pinecone, Qdrant, Weaviate, Chroma.

Inference — Running a model to generate output. “Inference cost” = the cost of generating responses.

Quantization — Reducing model precision (e.g., 16-bit → 4-bit) to use less memory. Trades small quality loss for 4x less VRAM. See our local AI guides.

VRAM — GPU memory. Determines what models you can run locally. See how much VRAM you need.

Ollama — Tool for running AI models locally with one command.

Tools & Workflows

AI coding agent — Tool that autonomously writes, edits, and debugs code. Examples: Claude Code, Aider, Kimi CLI.

Agentic coding — AI that plans, executes, tests, and iterates autonomously. Beyond simple code generation.

Agent Swarm — Multiple AI agents working in parallel on the same task. Kimi K2.5’s signature feature.

Model routing — Sending simple tasks to cheap models and complex tasks to expensive ones. Saves 40-60% on API costs.

OpenRouter — Unified API gateway for 300+ AI models through one endpoint.

Compliance

GDPR — EU data protection regulation. Affects how you use AI APIs with personal data.

EU AI Act — EU regulation classifying AI systems by risk level. Full enforcement August 2026.

DPA (Data Processing Agreement) — Contract required under GDPR when a third party processes personal data on your behalf.

AI Glossary for Developers — Every Term You Need to Know (2026)

Models & Architecture

Tokens & Context

Techniques

Infrastructure

Tools & Workflows

Compliance

📬 AI Dev Weekly

You might also like

What is Continue.dev? The Open-Source AI Coding Assistant Explained

OpenCode: The Open-Source AI Coding CLI You Should Know (2026)

What is OpenRouter? The Universal AI API Gateway Explained

What is Aider? The Open-Source AI Pair Programmer Explained