Apr 13, 2026 · 3 min read

Embeddings Explained for Developers — How AI Search Actually Works

Every AI search system — from Perplexity to GitHub Copilot’s codebase search — runs on embeddings. They’re the reason AI can find “how to fix a memory leak” when you search for “my app keeps crashing.”

Here’s how they work, explained for developers who build things.

What is an embedding?

An embedding is a fixed-length array of numbers that represents the meaning of text. Two texts with similar meaning have similar numbers.

from openai import OpenAI
client = OpenAI()

# These two sentences mean similar things
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=["The server is running out of memory", "RAM usage is too high"]
)

vec1 = response.data[0].embedding  # [0.023, -0.041, 0.089, ...]
vec2 = response.data[1].embedding  # [0.021, -0.038, 0.092, ...]
# These vectors are CLOSE together (high cosine similarity)

The model converts text into a point in high-dimensional space (1536 dimensions for OpenAI’s small model). Similar meanings = nearby points.

Why this matters for search

Traditional search matches keywords. Embedding search matches meaning.

Query	Keyword search finds	Embedding search finds
”fix memory leak”	Pages containing “memory leak”	Pages about RAM issues, garbage collection, heap dumps
”deploy to production”	Pages with “deploy” and “production”	Pages about CI/CD, release management, rollbacks
”make it faster”	Nothing useful	Pages about performance optimization, caching, indexing

This is why AI coding tools can understand your intent even when you describe problems vaguely.

How vector search works

Index time: Convert all your documents into embeddings, store in a vector database
Query time: Convert the search query into an embedding
Search: Find the stored embeddings closest to the query embedding
Return: The closest documents are the most relevant results

import chromadb

# 1. Create a collection
client = chromadb.Client()
collection = client.create_collection("docs")

# 2. Add documents (embeddings generated automatically)
collection.add(
    documents=["Python memory management uses garbage collection",
                "JavaScript uses V8's mark-and-sweep GC",
                "Rust uses ownership instead of GC"],
    ids=["doc1", "doc2", "doc3"]
)

# 3. Search by meaning
results = collection.query(
    query_texts=["how does memory cleanup work"],
    n_results=2
)
# Returns doc1 and doc2 — they're about memory management
# Even though the query doesn't contain "garbage collection"

Choosing an embedding model

Model	Dimensions	Cost	Quality	Best for
OpenAI text-embedding-3-small	1536	$0.02/1M tokens	Good	General use
OpenAI text-embedding-3-large	3072	$0.13/1M tokens	Better	High accuracy
Cohere embed-v4	1024	$0.10/1M tokens	Very good	Multilingual
Codestral Embed	1024	$0.08/1M tokens	Best for code	Code search
Nomic embed	768	Free (local)	Good	Privacy/cost

For code search, Codestral Embed from Mistral is purpose-built. For general text, OpenAI’s small model is the best value.

The distance problem

Not all “close” embeddings are relevant. Common failure modes:

False positives: “Java coffee” and “Java programming” are close because “Java” dominates the embedding. Fix: use longer, more specific text chunks.

Stale embeddings: Your docs changed but embeddings weren’t regenerated. Fix: re-embed on content updates.

Wrong granularity: Embedding entire pages loses detail. Embedding single sentences loses context. Fix: chunk at paragraph level (200-500 tokens).

See our RAG failures guide for more on fixing retrieval quality.

Hybrid search: the production answer

Pure vector search misses exact matches. Pure keyword search misses meaning. Production systems use both:

# Hybrid search: combine BM25 (keyword) + vector similarity
results = search(
    query="ECONNRESET error in Node.js",
    vector_weight=0.7,   # Semantic meaning
    keyword_weight=0.3    # Exact term matching
)
# Finds both: pages mentioning "ECONNRESET" AND pages about connection reset errors

Weaviate and Pinecone support hybrid search natively. For simpler setups, run keyword search and vector search separately, then merge results.

When to use embeddings

Use embeddings for: Semantic search, recommendation systems, duplicate detection, clustering, RAG systems.

Don’t use embeddings for: Exact lookups (use a database), real-time analytics (use SQL), simple keyword matching (use full-text search).

Embeddings Explained for Developers — How AI Search Actually Works

What is an embedding?

Why this matters for search

How vector search works

Choosing an embedding model

The distance problem

Hybrid search: the production answer

When to use embeddings

📬 Get weekly dev tools & AI tips

You might also like

What is RAG? Retrieval-Augmented Generation Explained for Developers

What is a Vector Database? A Simple Explanation for Developers

Codestral Complete Guide — Mistral's 22B Coding Model Explained (2026)

MCP Security Risks Every Developer Should Know (2026)