Apr 19, 2026 · 2 min read

Why Your RAG System Returns Bad Results (And How to Fix It)

You built a RAG system. It works on your test data. In production, it returns irrelevant garbage 30% of the time. Here are the seven most common causes and how to fix each one.

1. Bad chunking

Symptom: Answers are vague or miss important details.

Cause: Chunks are too large (entire pages) or too small (single sentences). Large chunks dilute the relevant information. Small chunks lose context.

Fix: Chunk at 200-500 tokens with 50-token overlap. Use semantic chunking (split at paragraph/section boundaries) instead of fixed-size:

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    separators=["\n\n", "\n", ". ", " "]  # Split at natural boundaries
)
chunks = splitter.split_text(document)

2. Wrong embedding model

Symptom: Search returns topically related but not actually relevant results.

Cause: Using a general-purpose embedding model for specialized content (code, legal, medical).

Fix: Use domain-specific embeddings. For code, use Codestral Embed. For general text, OpenAI’s text-embedding-3-large is the safest choice. See our embeddings guide.

3. No hybrid search

Symptom: Exact terms aren’t found. Searching for “ECONNRESET” returns pages about “connection errors” but not the specific error code.

Cause: Pure vector search matches meaning but misses exact keywords.

Fix: Combine vector search with BM25 keyword search:

# Pseudo-code for hybrid search
vector_results = vector_db.search(query_embedding, top_k=10)
keyword_results = bm25_search(query_text, top_k=10)
final_results = reciprocal_rank_fusion(vector_results, keyword_results)

Weaviate supports this natively. For other databases, implement it manually.

4. Stale embeddings

Symptom: RAG returns outdated information even though docs were updated.

Cause: Documents changed but embeddings weren’t regenerated.

Fix: Re-embed on every content update. Use a hash to detect changes:

import hashlib

def needs_reembedding(doc_id, new_content):
    new_hash = hashlib.md5(new_content.encode()).hexdigest()
    old_hash = get_stored_hash(doc_id)
    return new_hash != old_hash

5. No metadata filtering

Symptom: Results from wrong categories, dates, or sources.

Cause: Vector similarity alone can’t distinguish between a 2024 guide and a 2026 guide on the same topic.

Fix: Store metadata and filter before vector search:

results = collection.query(
    query_texts=["how to deploy"],
    where={"year": {"$gte": 2025}, "category": "deployment"},
    n_results=5
)

6. Context window overflow

Symptom: LLM ignores some retrieved documents or gives incomplete answers.

Cause: Too many retrieved chunks exceed the model’s effective context window. Models degrade on information in the middle of long contexts (“lost in the middle” problem).

Fix: Retrieve fewer, more relevant chunks (3-5 instead of 10-20). Put the most relevant chunk first and last.

7. No reranking

Symptom: The best result is at position 5 instead of position 1.

Cause: Embedding similarity is a rough approximation. The top-10 results are all “similar” but not equally relevant.

Fix: Add a reranker that scores each result against the query:

from cohere import Client
co = Client(api_key="your-key")

reranked = co.rerank(
    query="how to fix memory leak in Node.js",
    documents=[chunk.text for chunk in initial_results],
    top_n=3
)

Cohere Rerank and cross-encoder models are the most common choices.

The debugging checklist

When your RAG returns bad results:

Check the retrieved chunks — are they relevant? (retrieval problem)
Check the prompt — does the LLM have enough context? (generation problem)
Check the embedding model — is it appropriate for your domain?
Check chunk sizes — too big or too small?
Try hybrid search — does adding keywords help?

Why Your RAG System Returns Bad Results (And How to Fix It)

1. Bad chunking

2. Wrong embedding model

3. No hybrid search

4. Stale embeddings

5. No metadata filtering

6. Context window overflow

7. No reranking

The debugging checklist

📬 AI Dev Weekly

You might also like

RAG vs Fine-Tuning — When to Use Each (With Real Cost Data)

How to Debug AI Agents — When Your Agent Goes Off the Rails

Why Parsing LLM Output Keeps Breaking Your App

How to Build an AI Search Engine — From Zero to Perplexity Clone