πŸ€– AI Tools
Β· 1 min read

What is RAG? Retrieval-Augmented Generation Explained for Developers


RAG (Retrieval-Augmented Generation) is a technique that gives AI models access to external knowledge at query time. Instead of relying only on what the model learned during training, RAG fetches relevant documents and includes them in the prompt.

This is how Perplexity answers questions with citations, how GitHub Copilot understands your codebase, and how enterprise chatbots know about your company’s internal docs.

How it works

1. User asks a question
2. System searches a knowledge base for relevant documents
3. Retrieved documents are added to the LLM prompt as context
4. LLM generates an answer using those documents
5. Answer includes citations to the source documents

Without RAG, the model can only use knowledge from its training data (which has a cutoff date and doesn’t include your private data). With RAG, it can answer questions about anything you put in the knowledge base.

A simple example

# 1. User question
question = "What's our refund policy?"

# 2. Search your docs (using embeddings + vector DB)
relevant_docs = vector_db.search(question, top_k=3)

# 3. Add docs to prompt
prompt = f"""Answer based on these documents:
{relevant_docs}

Question: {question}"""

# 4. LLM generates answer with citations
answer = llm.generate(prompt)

Key components

  • Embeddings β€” convert text to numbers for semantic search
  • Vector database β€” stores and searches embeddings (Pinecone, Qdrant, Chroma)
  • Chunking β€” splitting documents into searchable pieces
  • LLM β€” generates the final answer (Claude, GPT, DeepSeek)

When to use RAG

  • Your data changes frequently
  • You need answers about private/internal data
  • You want source citations
  • You can’t (or don’t want to) fine-tune a model

When NOT to use RAG

  • The model already knows the answer (general knowledge)
  • You need consistent output format (use fine-tuning instead)
  • Latency is critical (retrieval adds 100-500ms)

Learn more