🤖 AI Tools
· 2 min read

Why Your RAG System Returns Bad Results (And How to Fix It)


You built a RAG system. It works on your test data. In production, it returns irrelevant garbage 30% of the time. Here are the seven most common causes and how to fix each one.

1. Bad chunking

Symptom: Answers are vague or miss important details.

Cause: Chunks are too large (entire pages) or too small (single sentences). Large chunks dilute the relevant information. Small chunks lose context.

Fix: Chunk at 200-500 tokens with 50-token overlap. Use semantic chunking (split at paragraph/section boundaries) instead of fixed-size:

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    separators=["\n\n", "\n", ". ", " "]  # Split at natural boundaries
)
chunks = splitter.split_text(document)

2. Wrong embedding model

Symptom: Search returns topically related but not actually relevant results.

Cause: Using a general-purpose embedding model for specialized content (code, legal, medical).

Fix: Use domain-specific embeddings. For code, use Codestral Embed. For general text, OpenAI’s text-embedding-3-large is the safest choice. See our embeddings guide.

Symptom: Exact terms aren’t found. Searching for “ECONNRESET” returns pages about “connection errors” but not the specific error code.

Cause: Pure vector search matches meaning but misses exact keywords.

Fix: Combine vector search with BM25 keyword search:

# Pseudo-code for hybrid search
vector_results = vector_db.search(query_embedding, top_k=10)
keyword_results = bm25_search(query_text, top_k=10)
final_results = reciprocal_rank_fusion(vector_results, keyword_results)

Weaviate supports this natively. For other databases, implement it manually.

4. Stale embeddings

Symptom: RAG returns outdated information even though docs were updated.

Cause: Documents changed but embeddings weren’t regenerated.

Fix: Re-embed on every content update. Use a hash to detect changes:

import hashlib

def needs_reembedding(doc_id, new_content):
    new_hash = hashlib.md5(new_content.encode()).hexdigest()
    old_hash = get_stored_hash(doc_id)
    return new_hash != old_hash

5. No metadata filtering

Symptom: Results from wrong categories, dates, or sources.

Cause: Vector similarity alone can’t distinguish between a 2024 guide and a 2026 guide on the same topic.

Fix: Store metadata and filter before vector search:

results = collection.query(
    query_texts=["how to deploy"],
    where={"year": {"$gte": 2025}, "category": "deployment"},
    n_results=5
)

6. Context window overflow

Symptom: LLM ignores some retrieved documents or gives incomplete answers.

Cause: Too many retrieved chunks exceed the model’s effective context window. Models degrade on information in the middle of long contexts (“lost in the middle” problem).

Fix: Retrieve fewer, more relevant chunks (3-5 instead of 10-20). Put the most relevant chunk first and last.

7. No reranking

Symptom: The best result is at position 5 instead of position 1.

Cause: Embedding similarity is a rough approximation. The top-10 results are all “similar” but not equally relevant.

Fix: Add a reranker that scores each result against the query:

from cohere import Client
co = Client(api_key="your-key")

reranked = co.rerank(
    query="how to fix memory leak in Node.js",
    documents=[chunk.text for chunk in initial_results],
    top_n=3
)

Cohere Rerank and cross-encoder models are the most common choices.

The debugging checklist

When your RAG returns bad results:

  1. Check the retrieved chunks — are they relevant? (retrieval problem)
  2. Check the prompt — does the LLM have enough context? (generation problem)
  3. Check the embedding model — is it appropriate for your domain?
  4. Check chunk sizes — too big or too small?
  5. Try hybrid search — does adding keywords help?

Related: Embeddings Explained · Vector Databases Compared · RAG vs Fine-Tuning · How to Build an AI Search Engine