Every AI search system — from Perplexity to GitHub Copilot’s codebase search — runs on embeddings. They’re the reason AI can find “how to fix a memory leak” when you search for “my app keeps crashing.”
Here’s how they work, explained for developers who build things.
What is an embedding?
An embedding is a fixed-length array of numbers that represents the meaning of text. Two texts with similar meaning have similar numbers.
from openai import OpenAI
client = OpenAI()
# These two sentences mean similar things
response = client.embeddings.create(
model="text-embedding-3-small",
input=["The server is running out of memory", "RAM usage is too high"]
)
vec1 = response.data[0].embedding # [0.023, -0.041, 0.089, ...]
vec2 = response.data[1].embedding # [0.021, -0.038, 0.092, ...]
# These vectors are CLOSE together (high cosine similarity)
The model converts text into a point in high-dimensional space (1536 dimensions for OpenAI’s small model). Similar meanings = nearby points.
Why this matters for search
Traditional search matches keywords. Embedding search matches meaning.
| Query | Keyword search finds | Embedding search finds |
|---|---|---|
| ”fix memory leak” | Pages containing “memory leak” | Pages about RAM issues, garbage collection, heap dumps |
| ”deploy to production” | Pages with “deploy” and “production” | Pages about CI/CD, release management, rollbacks |
| ”make it faster” | Nothing useful | Pages about performance optimization, caching, indexing |
This is why AI coding tools can understand your intent even when you describe problems vaguely.
How vector search works
- Index time: Convert all your documents into embeddings, store in a vector database
- Query time: Convert the search query into an embedding
- Search: Find the stored embeddings closest to the query embedding
- Return: The closest documents are the most relevant results
import chromadb
# 1. Create a collection
client = chromadb.Client()
collection = client.create_collection("docs")
# 2. Add documents (embeddings generated automatically)
collection.add(
documents=["Python memory management uses garbage collection",
"JavaScript uses V8's mark-and-sweep GC",
"Rust uses ownership instead of GC"],
ids=["doc1", "doc2", "doc3"]
)
# 3. Search by meaning
results = collection.query(
query_texts=["how does memory cleanup work"],
n_results=2
)
# Returns doc1 and doc2 — they're about memory management
# Even though the query doesn't contain "garbage collection"
Choosing an embedding model
| Model | Dimensions | Cost | Quality | Best for |
|---|---|---|---|---|
| OpenAI text-embedding-3-small | 1536 | $0.02/1M tokens | Good | General use |
| OpenAI text-embedding-3-large | 3072 | $0.13/1M tokens | Better | High accuracy |
| Cohere embed-v4 | 1024 | $0.10/1M tokens | Very good | Multilingual |
| Codestral Embed | 1024 | $0.08/1M tokens | Best for code | Code search |
| Nomic embed | 768 | Free (local) | Good | Privacy/cost |
For code search, Codestral Embed from Mistral is purpose-built. For general text, OpenAI’s small model is the best value.
The distance problem
Not all “close” embeddings are relevant. Common failure modes:
False positives: “Java coffee” and “Java programming” are close because “Java” dominates the embedding. Fix: use longer, more specific text chunks.
Stale embeddings: Your docs changed but embeddings weren’t regenerated. Fix: re-embed on content updates.
Wrong granularity: Embedding entire pages loses detail. Embedding single sentences loses context. Fix: chunk at paragraph level (200-500 tokens).
See our RAG failures guide for more on fixing retrieval quality.
Hybrid search: the production answer
Pure vector search misses exact matches. Pure keyword search misses meaning. Production systems use both:
# Hybrid search: combine BM25 (keyword) + vector similarity
results = search(
query="ECONNRESET error in Node.js",
vector_weight=0.7, # Semantic meaning
keyword_weight=0.3 # Exact term matching
)
# Finds both: pages mentioning "ECONNRESET" AND pages about connection reset errors
Weaviate and Pinecone support hybrid search natively. For simpler setups, run keyword search and vector search separately, then merge results.
When to use embeddings
Use embeddings for: Semantic search, recommendation systems, duplicate detection, clustering, RAG systems.
Don’t use embeddings for: Exact lookups (use a database), real-time analytics (use SQL), simple keyword matching (use full-text search).
Related: Vector Databases Compared · How to Build an AI Search Engine · RAG vs Fine-Tuning · Best Free AI APIs 2026