You’ve been taking notes in Obsidian for months — maybe years. Research snippets, meeting notes, project ideas, reading highlights. It’s all there, buried across hundreds of markdown files. What if you could just ask your vault a question and get an answer grounded in your own writing?
That’s exactly what we’re building today. A Python tool that indexes every markdown file in your Obsidian vault, generates embeddings with Ollama, stores them in ChromaDB, and lets you query your notes using natural language. Everything runs locally — no API keys, no cloud services, no data leaving your machine.
This tutorial builds directly on the RAG pipeline we built previously. If you haven’t read that one yet, it covers the foundational concepts. Here, we’re applying that same pattern to a real-world use case.
Why This Matters
Most of us are terrible at retrieving what we’ve already written down. Obsidian’s built-in search is keyword-based — it finds exact matches, not meaning. If you wrote about “deployment strategies” three months ago but now search for “how to ship code to production,” you’ll get nothing.
RAG (Retrieval-Augmented Generation) fixes this. By converting your notes into embeddings — dense vector representations of meaning — you can search by concept instead of keyword. Pair that with a local LLM through Ollama and you get answers synthesized from your own notes.
And since everything stays on your machine, there are zero privacy concerns — a real advantage over cloud-based alternatives.
Prerequisites
- Ollama installed and running (
ollama serve) - An embedding model pulled:
ollama pull nomic-embed-text - A chat model pulled:
ollama pull llama3.2 - Python 3.10+
- An Obsidian vault with some markdown files
Install the dependencies:
pip install chromadb requests
That’s it — two packages. ChromaDB handles vector storage and retrieval, and requests talks to Ollama’s API.
Project Structure
obsidian-kb/
├── kb.py # The entire tool — indexing + querying
└── README.md
One file. Let’s keep it simple.
The Complete Tool
Here’s the full implementation. We’ll walk through each section below.
#!/usr/bin/env python3
"""Query your Obsidian vault with natural language using RAG + Ollama."""
import argparse
import hashlib
from pathlib import Path
import chromadb
import requests
OLLAMA_URL = "http://localhost:11434"
EMBED_MODEL = "nomic-embed-text"
CHAT_MODEL = "llama3.2"
COLLECTION_NAME = "obsidian_vault"
CHUNK_SIZE = 500
def get_embedding(text: str) -> list[float]:
r = requests.post(f"{OLLAMA_URL}/api/embed", json={"model": EMBED_MODEL, "input": text})
r.raise_for_status()
return r.json()["embeddings"][0]
def chunk_text(text: str, size: int = CHUNK_SIZE) -> list[str]:
paragraphs = text.split("\n\n")
chunks, current = [], ""
for para in paragraphs:
if len(current) + len(para) > size and current:
chunks.append(current.strip())
current = ""
current += para + "\n\n"
if current.strip():
chunks.append(current.strip())
return chunks
def index_vault(vault_path: str, db_path: str = "./chroma_db"):
vault = Path(vault_path)
if not vault.exists():
raise FileNotFoundError(f"Vault not found: {vault_path}")
client = chromadb.PersistentClient(path=db_path)
# Reset collection for a clean re-index
try:
client.delete_collection(COLLECTION_NAME)
except ValueError:
pass
collection = client.get_or_create_collection(COLLECTION_NAME)
md_files = list(vault.rglob("*.md"))
print(f"Found {len(md_files)} markdown files")
total_chunks = 0
for filepath in md_files:
text = filepath.read_text(encoding="utf-8", errors="ignore")
if len(text.strip()) < 50:
continue
rel_path = str(filepath.relative_to(vault))
chunks = chunk_text(text)
for i, chunk in enumerate(chunks):
chunk_id = hashlib.md5(f"{rel_path}:{i}".encode()).hexdigest()
embedding = get_embedding(chunk)
collection.add(
ids=[chunk_id],
embeddings=[embedding],
documents=[chunk],
metadatas=[{"source": rel_path, "chunk": i}],
)
total_chunks += len(chunks)
print(f" Indexed: {rel_path} ({len(chunks)} chunks)")
print(f"\nDone. {total_chunks} chunks indexed from {len(md_files)} files.")
def query_vault(question: str, db_path: str = "./chroma_db", n_results: int = 5):
client = chromadb.PersistentClient(path=db_path)
collection = client.get_collection(COLLECTION_NAME)
q_embedding = get_embedding(question)
results = collection.query(query_embeddings=[q_embedding], n_results=n_results)
context_parts = []
sources = set()
for doc, meta in zip(results["documents"][0], results["metadatas"][0]):
context_parts.append(doc)
sources.add(meta["source"])
context = "\n---\n".join(context_parts)
prompt = f"""Use the following excerpts from the user's notes to answer their question.
If the notes don't contain relevant information, say so honestly.
NOTES:
{context}
QUESTION: {question}
ANSWER:"""
r = requests.post(f"{OLLAMA_URL}/api/generate", json={
"model": CHAT_MODEL,
"prompt": prompt,
"stream": False,
})
r.raise_for_status()
answer = r.json()["response"]
print(f"\n{answer}\n")
print("Sources:")
for s in sorted(sources):
print(f" - {s}")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Obsidian AI Knowledge Base")
sub = parser.add_subparsers(dest="command")
idx = sub.add_parser("index", help="Index your Obsidian vault")
idx.add_argument("vault", help="Path to your Obsidian vault")
idx.add_argument("--db", default="./chroma_db", help="ChromaDB storage path")
ask = sub.add_parser("ask", help="Ask a question about your notes")
ask.add_argument("question", help="Your question")
ask.add_argument("--db", default="./chroma_db", help="ChromaDB storage path")
ask.add_argument("--results", type=int, default=5, help="Number of chunks to retrieve")
args = parser.parse_args()
if args.command == "index":
index_vault(args.vault, args.db)
elif args.command == "ask":
query_vault(args.question, args.db, args.results)
else:
parser.print_help()
How It Works
The tool has two modes: index and ask.
Indexing
python kb.py index ~/Documents/MyVault
This walks through every .md file in your vault recursively. Each file gets split into chunks at paragraph boundaries (roughly 500 characters each). Smaller chunks mean more precise retrieval — the LLM gets focused context instead of entire documents.
Each chunk is sent to Ollama’s nomic-embed-text model to generate an embedding vector. That vector, along with the original text and source file path, gets stored in ChromaDB. The persistent client means your index survives between runs — you don’t re-embed every time you ask a question.
Querying
python kb.py ask "What were my key takeaways from the architecture review?"
Your question gets embedded with the same model. ChromaDB performs a similarity search and returns the top 5 most relevant chunks. Those chunks are assembled into a context block and passed to llama3.2 along with your question. The LLM synthesizes an answer grounded in your actual notes, and the tool prints which source files contributed.
Chunking Strategy
The chunking function splits on double newlines (paragraph boundaries), which works well for Obsidian notes since most people write in paragraphs separated by blank lines. Chunks accumulate paragraphs until they exceed the size limit, then start a new chunk.
This is intentionally simple. For most vaults, paragraph-level chunking gives good results. If you have very long documents with distinct sections, you could enhance this to split on headings (## ) instead.
Usage Tips
Re-index when your notes change. The index is a snapshot. After a week of heavy note-taking, run index again. It rebuilds from scratch, which is fine for vaults under ~10,000 files.
Tune the chunk count. The --results flag controls how many chunks feed into the LLM’s context. More chunks = broader context but more noise. For specific factual questions, try --results 3. For broad “summarize everything about X” queries, bump it to --results 10.
Try different models. Swap CHAT_MODEL to mistral, phi3, or gemma2 depending on your hardware. Smaller models respond faster; larger ones reason better over complex context.
Exclude folders. If your vault has templates or daily notes you don’t want indexed, add a filter in index_vault:
if ".trash" in str(filepath) or "templates" in str(filepath.parent):
continue
What You Could Add Next
- Incremental indexing — track file modification times and only re-embed changed files
- Frontmatter parsing — extract tags and dates from YAML frontmatter for filtered queries
- A simple web UI — wrap the query function in a Flask or Gradio app
- Multi-vault support — index multiple vaults into separate collections
Wrapping Up
In about 100 lines of Python, you’ve turned a passive note archive into an active knowledge base. The same RAG pattern from our pipeline tutorial now works against your real data — and it all stays on your machine.
The key insight: your notes are already valuable. You just need a better retrieval layer than keyword search. Embeddings give you that, and a local LLM turns retrieval into conversation.
Next time you can’t remember where you wrote something down — just ask.