An embedding is a list of numbers (a vector) that represents the meaning of a piece of text. Instead of treating words as arbitrary strings, an embedding model converts them into coordinates in a high-dimensional space — where similar meanings end up close together. Think of it like plotting cities on a map: Paris and Lyon are near each other, while Tokyo is far away. Embeddings do the same thing, but for concepts. “dog” and “puppy” get vectors that are almost identical, while “dog” and “spreadsheet” land in completely different regions.
Why Embeddings Matter
Traditional keyword search breaks when users don’t use the exact right words. If your docs say “authentication” but a user searches “login,” a keyword match fails. Embeddings solve this because both words map to nearby vectors — the system understands they mean roughly the same thing.
This is the foundation behind most modern AI features: semantic search, chatbot memory, recommendation engines, and retrieval-augmented generation (RAG). If you’re building anything that needs to understand text rather than just match it, embeddings are the tool you reach for.
Three Common Use Cases
1. Semantic Search
Store embeddings for every document in your collection. When a user searches, embed their query and find the closest vectors. Results are ranked by meaning, not keyword overlap. This is how modern search feels “smart.”
2. Retrieval-Augmented Generation (RAG)
RAG combines embeddings with an LLM. You embed your knowledge base, retrieve the most relevant chunks for a user’s question, then pass those chunks to the LLM as context. The model answers using your data instead of hallucinating. Check out Build a Local RAG Pipeline with Ollama for a hands-on walkthrough.
3. Recommendations
Embed your product descriptions, articles, or user profiles. To recommend similar items, find the nearest neighbors to a given embedding. Netflix-style “because you watched X” features work on this exact principle.
How to Generate an Embedding
You send text to an embedding model and get back a vector. Here’s how it looks with two popular options.
With OpenAI (API):
from openai import OpenAI
client = OpenAI()
response = client.embeddings.create(
model="text-embedding-3-small",
input="What is an embedding?"
)
vector = response.data[0].embedding
print(len(vector)) # 1536 dimensions
With Ollama (local, free):
import ollama
response = ollama.embed(
model="nomic-embed-text",
input="What is an embedding?"
)
vector = response["embeddings"][0]
print(len(vector)) # 768 dimensions
Ollama runs entirely on your machine — no API key, no cost. See the Ollama Complete Guide to get set up.
Both approaches return a plain list of floats. What you do next is store that vector somewhere you can search it — typically a vector database.
Popular Embedding Models (2026)
| Model | Provider | Dimensions | Context | Notes |
|---|---|---|---|---|
text-embedding-3-small | OpenAI | 1536 | 8,191 tokens | Best balance of cost and quality |
text-embedding-3-large | OpenAI | 3072 | 8,191 tokens | Higher accuracy, higher cost |
nomic-embed-text | Nomic / Ollama | 768 | 8,192 tokens | Top open-source option, runs locally |
mxbai-embed-large | Mixedbread / Ollama | 1024 | 512 tokens | Strong performance, smaller context |
Cohere embed-v4 | Cohere | 1024 | 128,000 tokens | Huge context window |
For most beginner projects, text-embedding-3-small (cloud) or nomic-embed-text (local) are the go-to choices.
Key Things to Remember
- Same model, always. You must use the same embedding model for both storing and querying. Vectors from different models aren’t compatible.
- Dimensions matter. More dimensions can capture more nuance, but they also use more storage and compute. Start small.
- Chunking matters. Long documents should be split into smaller chunks before embedding. Most models have a token limit, and shorter chunks tend to produce more focused vectors.
- Cosine similarity is the standard way to compare two embeddings. A score of 1.0 means identical meaning; 0.0 means unrelated.
Where to Go Next
This post gave you the “what” and “why.” Ready to go deeper?
- How Embeddings Work — a more detailed technical dive into the math and intuition behind embedding spaces.
- What Is a Vector Database? — where you store and search embeddings at scale.
- Build a Local RAG Pipeline with Ollama — put embeddings to work in a real project.
- Ollama Complete Guide (2026) — run embedding models locally on your own hardware.
FAQ
What’s the difference between an embedding and a regular word encoding like one-hot?
One-hot encoding treats every word as equally different — “dog” is as far from “puppy” as it is from “spreadsheet.” Embeddings capture semantic relationships, so similar concepts get similar vectors. This is what enables semantic search, where meaning matters more than exact word matches.
Can I use embeddings for languages other than English?
Yes — most modern embedding models are multilingual. Models like text-embedding-3-small and nomic-embed-text handle dozens of languages and can even match queries in one language to documents in another. Performance is best for high-resource languages but works across most common languages.
How often do I need to re-embed my documents?
Only when you change embedding models or update the documents themselves. Embeddings are deterministic — the same text with the same model always produces the same vector. If your documents change frequently, set up a pipeline that re-embeds only the modified chunks rather than the entire collection.
Embeddings are one of those concepts that unlock a huge number of AI features once you understand them. The good news: you don’t need to understand the linear algebra — just the idea that text goes in, meaningful numbers come out, and similar things get similar numbers.