Apr 17, 2026 · 2 min read

How to Build an AI Search Engine — From Zero to Perplexity Clone

Perplexity answers questions by searching the web, reading the results, and synthesizing an answer with citations. You can build the same thing. Here’s how.

Architecture

User question
    ↓
1. Classify: needs web search? or local knowledge?
    ↓
2. Search: web API (Brave/Serper) + vector DB (local docs)
    ↓
3. Retrieve: fetch top 5-10 results, extract relevant text
    ↓
4. Generate: LLM synthesizes answer with citations
    ↓
5. Stream: response streams to user in real-time

Step 1: Web search API

You need a search API that returns actual page content, not just links:

import requests

def web_search(query, num_results=5):
    # Brave Search API (free tier: 2,000 queries/month)
    resp = requests.get("https://api.search.brave.com/res/v1/web/search", 
        headers={"X-Subscription-Token": BRAVE_API_KEY},
        params={"q": query, "count": num_results}
    )
    return [{"title": r["title"], "url": r["url"], "snippet": r["description"]} 
            for r in resp.json()["web"]["results"]]

Alternatives: Serper ($50/5K queries), SerpAPI, or Tavily (built for AI search).

Step 2: Content extraction

Search results give you snippets. For better answers, fetch and extract the full page:

from trafilatura import fetch_url, extract

def get_page_content(url):
    downloaded = fetch_url(url)
    return extract(downloaded, include_links=False, include_tables=True)

Step 3: The RAG pipeline

Combine search results with an LLM to generate an answer:

from openai import OpenAI

client = OpenAI()  # or OpenRouter, DeepSeek, etc.

def answer_question(question, search_results):
    context = "\n\n".join([
        f"Source [{i+1}]: {r['title']}\nURL: {r['url']}\n{r['content'][:2000]}"
        for i, r in enumerate(search_results)
    ])
    
    response = client.chat.completions.create(
        model="deepseek-chat",  # Cheap and good enough
        messages=[{
            "role": "system",
            "content": "Answer the question using ONLY the provided sources. Cite sources as [1], [2], etc. If sources don't contain the answer, say so."
        }, {
            "role": "user", 
            "content": f"Sources:\n{context}\n\nQuestion: {question}"
        }],
        stream=True
    )
    return response

Using DeepSeek at $0.27/1M tokens keeps costs under $0.001 per query. For better quality, use Claude or GPT-5.

Step 4: Add a vector database for local knowledge

For searching your own documents (not just the web), add a vector database:

import chromadb

# Index your docs once
collection = chromadb.Client().create_collection("knowledge")
collection.add(
    documents=your_documents,
    ids=[f"doc_{i}" for i in range(len(your_documents))]
)

# Search at query time
def local_search(query):
    return collection.query(query_texts=[query], n_results=5)

Step 5: Combine web + local search

def search(question):
    web_results = web_search(question)
    local_results = local_search(question)
    
    # Merge and deduplicate
    all_results = web_results + local_results
    return answer_question(question, all_results)

Cost per query

Component	Cost per query
Web search (Brave)	$0.025
Content extraction	Free (self-hosted)
Embeddings (local search)	$0.0001
LLM generation (DeepSeek)	$0.001
Total	~$0.03

At $0.03/query, 10,000 queries/month costs $300. Using prompt caching and model routing can cut this further.

Going further

Streaming UI — Use Server-Sent Events to stream the answer as it generates
Follow-up questions — Maintain conversation context across queries
Source quality ranking — Weight authoritative sources higher
Answer verification — Cross-check claims against multiple sources

How to Build an AI Search Engine — From Zero to Perplexity Clone

Architecture

Step 1: Web search API

Step 2: Content extraction

Step 3: The RAG pipeline

Step 4: Add a vector database for local knowledge

Step 5: Combine web + local search

Cost per query

Going further

📬 AI Dev Weekly

You might also like

Why Your RAG System Returns Bad Results (And How to Fix It)

RAG vs Fine-Tuning — When to Use Each (With Real Cost Data)

MCP for RAG — Connect AI to Your Knowledge Base

Building a RAG System That Scales — Architecture Deep Dive (2026)