πŸ€– AI Tools
Β· 2 min read

How to Build an AI Search Engine β€” From Zero to Perplexity Clone


Perplexity answers questions by searching the web, reading the results, and synthesizing an answer with citations. You can build the same thing. Here’s how.

Architecture

User question
    ↓
1. Classify: needs web search? or local knowledge?
    ↓
2. Search: web API (Brave/Serper) + vector DB (local docs)
    ↓
3. Retrieve: fetch top 5-10 results, extract relevant text
    ↓
4. Generate: LLM synthesizes answer with citations
    ↓
5. Stream: response streams to user in real-time

Step 1: Web search API

You need a search API that returns actual page content, not just links:

import requests

def web_search(query, num_results=5):
    # Brave Search API (free tier: 2,000 queries/month)
    resp = requests.get("https://api.search.brave.com/res/v1/web/search", 
        headers={"X-Subscription-Token": BRAVE_API_KEY},
        params={"q": query, "count": num_results}
    )
    return [{"title": r["title"], "url": r["url"], "snippet": r["description"]} 
            for r in resp.json()["web"]["results"]]

Alternatives: Serper ($50/5K queries), SerpAPI, or Tavily (built for AI search).

Step 2: Content extraction

Search results give you snippets. For better answers, fetch and extract the full page:

from trafilatura import fetch_url, extract

def get_page_content(url):
    downloaded = fetch_url(url)
    return extract(downloaded, include_links=False, include_tables=True)

Step 3: The RAG pipeline

Combine search results with an LLM to generate an answer:

from openai import OpenAI

client = OpenAI()  # or OpenRouter, DeepSeek, etc.

def answer_question(question, search_results):
    context = "\n\n".join([
        f"Source [{i+1}]: {r['title']}\nURL: {r['url']}\n{r['content'][:2000]}"
        for i, r in enumerate(search_results)
    ])
    
    response = client.chat.completions.create(
        model="deepseek-chat",  # Cheap and good enough
        messages=[{
            "role": "system",
            "content": "Answer the question using ONLY the provided sources. Cite sources as [1], [2], etc. If sources don't contain the answer, say so."
        }, {
            "role": "user", 
            "content": f"Sources:\n{context}\n\nQuestion: {question}"
        }],
        stream=True
    )
    return response

Using DeepSeek at $0.27/1M tokens keeps costs under $0.001 per query. For better quality, use Claude or GPT-5.

Step 4: Add a vector database for local knowledge

For searching your own documents (not just the web), add a vector database:

import chromadb

# Index your docs once
collection = chromadb.Client().create_collection("knowledge")
collection.add(
    documents=your_documents,
    ids=[f"doc_{i}" for i in range(len(your_documents))]
)

# Search at query time
def local_search(query):
    return collection.query(query_texts=[query], n_results=5)
def search(question):
    web_results = web_search(question)
    local_results = local_search(question)
    
    # Merge and deduplicate
    all_results = web_results + local_results
    return answer_question(question, all_results)

Cost per query

ComponentCost per query
Web search (Brave)$0.025
Content extractionFree (self-hosted)
Embeddings (local search)$0.0001
LLM generation (DeepSeek)$0.001
Total~$0.03

At $0.03/query, 10,000 queries/month costs $300. Using prompt caching and model routing can cut this further.

Going further

  • Streaming UI β€” Use Server-Sent Events to stream the answer as it generates
  • Follow-up questions β€” Maintain conversation context across queries
  • Source quality ranking β€” Weight authoritative sources higher
  • Answer verification β€” Cross-check claims against multiple sources

Related: Embeddings Explained Β· Vector Databases Compared Β· RAG vs Fine-Tuning Β· Best Free AI APIs 2026