πŸ€– AI Tools
Β· 3 min read

AI Glossary for Developers β€” Every Term You Need to Know (2026)


AI terminology explained in plain English, for developers who build things.

Models & Architecture

LLM (Large Language Model) β€” An AI model trained on text that can generate text. Examples: Claude, GPT-5, Gemma 4.

MoE (Mixture of Experts) β€” Architecture where only a subset of parameters activate per token. GLM-5.1 has 754B total but only 40B active. Makes large models efficient.

Dense model β€” All parameters activate for every token. Mistral Large 2 (123B) is dense. Simpler but needs more compute per token.

Parameters β€” The learned values in a model. More parameters generally = more capable. Measured in billions (B).

Open-weight / Open-source β€” Model weights are publicly available. You can download and run them locally. Examples: Qwen 3.5, Kimi K2.5, DeepSeek.

Frontier model β€” The most capable models available. Currently: Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro.

Tokens & Context

Token β€” The unit models process. Roughly 1 token β‰ˆ 0.75 words or ~4 characters. β€œHello world” = 2 tokens.

Context window β€” Maximum tokens a model can process in one request. Ranges from 4K to 1M+ tokens. Larger = can read more code at once.

Input tokens β€” What you send to the model (your prompt, code, context).

Output tokens β€” What the model generates (the response). Usually more expensive than input.

Techniques

RAG (Retrieval-Augmented Generation) β€” Fetching relevant documents and including them in the prompt so the model can answer questions about your data.

Fine-tuning β€” Retraining a model on your specific data to change its behavior or knowledge. More expensive than RAG.

Embeddings β€” Converting text into number arrays that capture meaning. Used for semantic search and RAG.

Prompt engineering β€” Writing instructions that get the best output from AI models.

Prompt caching β€” Reusing the beginning of prompts across requests to reduce costs by up to 90%.

FIM (Fill-in-the-Middle) β€” Model predicts code between a prefix and suffix. How Codestral does autocomplete.

Chain-of-thought β€” Asking the model to reason step by step before answering. Improves accuracy on complex tasks.

Infrastructure

Vector database β€” Database that stores embeddings and finds similar ones. Examples: Pinecone, Qdrant, Weaviate, Chroma.

Inference β€” Running a model to generate output. β€œInference cost” = the cost of generating responses.

Quantization β€” Reducing model precision (e.g., 16-bit β†’ 4-bit) to use less memory. Trades small quality loss for 4x less VRAM. See our local AI guides.

VRAM β€” GPU memory. Determines what models you can run locally. See how much VRAM you need.

Ollama β€” Tool for running AI models locally with one command.

Tools & Workflows

AI coding agent β€” Tool that autonomously writes, edits, and debugs code. Examples: Claude Code, Aider, Kimi CLI.

Agentic coding β€” AI that plans, executes, tests, and iterates autonomously. Beyond simple code generation.

Agent Swarm β€” Multiple AI agents working in parallel on the same task. Kimi K2.5’s signature feature.

Model routing β€” Sending simple tasks to cheap models and complex tasks to expensive ones. Saves 40-60% on API costs.

OpenRouter β€” Unified API gateway for 300+ AI models through one endpoint.

Compliance

GDPR β€” EU data protection regulation. Affects how you use AI APIs with personal data.

EU AI Act β€” EU regulation classifying AI systems by risk level. Full enforcement August 2026.

DPA (Data Processing Agreement) β€” Contract required under GDPR when a third party processes personal data on your behalf.

Related: Best AI Coding Tools 2026 Β· How to Choose an AI Coding Agent Β· Best Free AI APIs 2026