AI terminology explained in plain English, for developers who build things.
Models & Architecture
LLM (Large Language Model) β An AI model trained on text that can generate text. Examples: Claude, GPT-5, Gemma 4.
MoE (Mixture of Experts) β Architecture where only a subset of parameters activate per token. GLM-5.1 has 754B total but only 40B active. Makes large models efficient.
Dense model β All parameters activate for every token. Mistral Large 2 (123B) is dense. Simpler but needs more compute per token.
Parameters β The learned values in a model. More parameters generally = more capable. Measured in billions (B).
Open-weight / Open-source β Model weights are publicly available. You can download and run them locally. Examples: Qwen 3.5, Kimi K2.5, DeepSeek.
Frontier model β The most capable models available. Currently: Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro.
Tokens & Context
Token β The unit models process. Roughly 1 token β 0.75 words or ~4 characters. βHello worldβ = 2 tokens.
Context window β Maximum tokens a model can process in one request. Ranges from 4K to 1M+ tokens. Larger = can read more code at once.
Input tokens β What you send to the model (your prompt, code, context).
Output tokens β What the model generates (the response). Usually more expensive than input.
Techniques
RAG (Retrieval-Augmented Generation) β Fetching relevant documents and including them in the prompt so the model can answer questions about your data.
Fine-tuning β Retraining a model on your specific data to change its behavior or knowledge. More expensive than RAG.
Embeddings β Converting text into number arrays that capture meaning. Used for semantic search and RAG.
Prompt engineering β Writing instructions that get the best output from AI models.
Prompt caching β Reusing the beginning of prompts across requests to reduce costs by up to 90%.
FIM (Fill-in-the-Middle) β Model predicts code between a prefix and suffix. How Codestral does autocomplete.
Chain-of-thought β Asking the model to reason step by step before answering. Improves accuracy on complex tasks.
Infrastructure
Vector database β Database that stores embeddings and finds similar ones. Examples: Pinecone, Qdrant, Weaviate, Chroma.
Inference β Running a model to generate output. βInference costβ = the cost of generating responses.
Quantization β Reducing model precision (e.g., 16-bit β 4-bit) to use less memory. Trades small quality loss for 4x less VRAM. See our local AI guides.
VRAM β GPU memory. Determines what models you can run locally. See how much VRAM you need.
Ollama β Tool for running AI models locally with one command.
Tools & Workflows
AI coding agent β Tool that autonomously writes, edits, and debugs code. Examples: Claude Code, Aider, Kimi CLI.
Agentic coding β AI that plans, executes, tests, and iterates autonomously. Beyond simple code generation.
Agent Swarm β Multiple AI agents working in parallel on the same task. Kimi K2.5βs signature feature.
Model routing β Sending simple tasks to cheap models and complex tasks to expensive ones. Saves 40-60% on API costs.
OpenRouter β Unified API gateway for 300+ AI models through one endpoint.
Compliance
GDPR β EU data protection regulation. Affects how you use AI APIs with personal data.
EU AI Act β EU regulation classifying AI systems by risk level. Full enforcement August 2026.
DPA (Data Processing Agreement) β Contract required under GDPR when a third party processes personal data on your behalf.
Related: Best AI Coding Tools 2026 Β· How to Choose an AI Coding Agent Β· Best Free AI APIs 2026