Your team has 200 pages of documentation spread across Notion, Confluence, and random markdown files. New developers ask the same questions every week. βHow do I set up the dev environment?β βWhatβs the deployment process?β βWhereβs the API spec?β
In this tutorial, weβll build a chatbot that reads your docs and answers questions about them β running entirely on your machine. No API costs, no data leaving your network. Just Ollama, a vector database, and your markdown files.
What weβre building
$ node chat.js
π€ Ask me anything about your docs (type 'exit' to quit)
> How do I set up the development environment?
Based on your documentation:
1. Clone the repo: git clone git@github.com:team/project.git
2. Copy .env.example to .env and fill in the database URL
3. Run `docker compose up -d` to start PostgreSQL and Redis
4. Run `npm install && npm run migrate && npm run dev`
The dev server starts on localhost:3000. Hot reload is enabled.
Source: docs/getting-started.md
Prerequisites
- Ollama installed
- Node.js 20+
- Your documentation as markdown files (or any text files)
Step 1: Pull the models
# Chat model (3.8GB)
ollama pull llama3.2
# Embedding model (274MB)
ollama pull nomic-embed-text
Step 2: Set up the project
mkdir docs-chatbot && cd docs-chatbot
npm init -y
npm install langchain @langchain/community @langchain/ollama chromadb
Step 3: Index your docs
// index.js β Run once to index your documentation
import { DirectoryLoader } from 'langchain/document_loaders/fs/directory';
import { TextLoader } from 'langchain/document_loaders/fs/text';
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { OllamaEmbeddings } from '@langchain/ollama';
import { Chroma } from '@langchain/community/vectorstores/chroma';
// Load all markdown files from your docs folder
const loader = new DirectoryLoader('./docs', {
'.md': (path) => new TextLoader(path),
'.txt': (path) => new TextLoader(path),
});
const documents = await loader.load();
console.log(`Loaded ${documents.length} documents`);
// Split into chunks (important for accuracy)
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const chunks = await splitter.splitDocuments(documents);
console.log(`Split into ${chunks.length} chunks`);
// Create embeddings and store in ChromaDB
const embeddings = new OllamaEmbeddings({ model: 'nomic-embed-text' });
await Chroma.fromDocuments(chunks, embeddings, {
collectionName: 'my-docs',
url: 'http://localhost:8000',
});
console.log('β
Indexing complete');
Step 4: Build the chat interface
// chat.js
import { OllamaEmbeddings } from '@langchain/ollama';
import { ChatOllama } from '@langchain/ollama';
import { Chroma } from '@langchain/community/vectorstores/chroma';
import { createInterface } from 'readline';
const embeddings = new OllamaEmbeddings({ model: 'nomic-embed-text' });
const vectorStore = await Chroma.fromExistingCollection(embeddings, {
collectionName: 'my-docs',
url: 'http://localhost:8000',
});
const llm = new ChatOllama({ model: 'llama3.2' });
const rl = createInterface({ input: process.stdin, output: process.stdout });
console.log('π€ Ask me anything about your docs (type "exit" to quit)\n');
const ask = () => {
rl.question('> ', async (question) => {
if (question === 'exit') { rl.close(); return; }
// Find relevant chunks
const results = await vectorStore.similaritySearch(question, 3);
const context = results.map(r =>
`[${r.metadata.source}]\n${r.pageContent}`
).join('\n\n');
// Generate answer
const response = await llm.invoke(
`Answer this question using ONLY the provided documentation.
If the answer isn't in the docs, say "I couldn't find this in the documentation."
Cite the source file.
Documentation:
${context}
Question: ${question}`
);
console.log(`\n${response.content}\n`);
ask();
});
};
ask();
Step 5: Run it
# Start ChromaDB (one-time)
docker run -p 8000:8000 chromadb/chroma
# Index your docs (run when docs change)
node index.js
# Start chatting
node chat.js
Why this matters
- Privacy: Your docs never leave your machine. No API calls, no cloud storage.
- Cost: $0 after the initial setup. Run it as much as you want.
- Speed: Local inference with Llama 3.2 is fast enough for interactive chat.
- Customizable: Swap models, adjust chunk sizes, add more document types.
Making it better
- Add a web UI: Wrap the chat in an Express server with a simple HTML frontend
- Auto-reindex: Watch the docs folder for changes and reindex automatically
- Add Slack integration: Combine with the Slack bot tutorial to answer doc questions in Slack
- Better chunking: Use markdown-aware splitting that respects headers and code blocks
Total build time: ~1 hour. Running cost: $0 (everything runs locally).
Previous: Build a CLI That Explains Error Messages Next: Build a GitHub PR Description Generator
Related: How To Setup Open Webui