πŸ“ Tutorials
Β· 3 min read

Build a Local AI Chatbot for Your Docs (RAG With Ollama)


Your team has 200 pages of documentation spread across Notion, Confluence, and random markdown files. New developers ask the same questions every week. β€œHow do I set up the dev environment?” β€œWhat’s the deployment process?” β€œWhere’s the API spec?”

In this tutorial, we’ll build a chatbot that reads your docs and answers questions about them β€” running entirely on your machine. No API costs, no data leaving your network. Just Ollama, a vector database, and your markdown files.

What we’re building

$ node chat.js
πŸ€– Ask me anything about your docs (type 'exit' to quit)

> How do I set up the development environment?

Based on your documentation:

1. Clone the repo: git clone git@github.com:team/project.git
2. Copy .env.example to .env and fill in the database URL
3. Run `docker compose up -d` to start PostgreSQL and Redis
4. Run `npm install && npm run migrate && npm run dev`

The dev server starts on localhost:3000. Hot reload is enabled.

Source: docs/getting-started.md

Prerequisites

  • Ollama installed
  • Node.js 20+
  • Your documentation as markdown files (or any text files)

Step 1: Pull the models

# Chat model (3.8GB)
ollama pull llama3.2

# Embedding model (274MB)
ollama pull nomic-embed-text

Step 2: Set up the project

mkdir docs-chatbot && cd docs-chatbot
npm init -y
npm install langchain @langchain/community @langchain/ollama chromadb

Step 3: Index your docs

// index.js β€” Run once to index your documentation
import { DirectoryLoader } from 'langchain/document_loaders/fs/directory';
import { TextLoader } from 'langchain/document_loaders/fs/text';
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { OllamaEmbeddings } from '@langchain/ollama';
import { Chroma } from '@langchain/community/vectorstores/chroma';

// Load all markdown files from your docs folder
const loader = new DirectoryLoader('./docs', {
  '.md': (path) => new TextLoader(path),
  '.txt': (path) => new TextLoader(path),
});

const documents = await loader.load();
console.log(`Loaded ${documents.length} documents`);

// Split into chunks (important for accuracy)
const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});
const chunks = await splitter.splitDocuments(documents);
console.log(`Split into ${chunks.length} chunks`);

// Create embeddings and store in ChromaDB
const embeddings = new OllamaEmbeddings({ model: 'nomic-embed-text' });
await Chroma.fromDocuments(chunks, embeddings, {
  collectionName: 'my-docs',
  url: 'http://localhost:8000',
});

console.log('βœ… Indexing complete');

Step 4: Build the chat interface

// chat.js
import { OllamaEmbeddings } from '@langchain/ollama';
import { ChatOllama } from '@langchain/ollama';
import { Chroma } from '@langchain/community/vectorstores/chroma';
import { createInterface } from 'readline';

const embeddings = new OllamaEmbeddings({ model: 'nomic-embed-text' });
const vectorStore = await Chroma.fromExistingCollection(embeddings, {
  collectionName: 'my-docs',
  url: 'http://localhost:8000',
});

const llm = new ChatOllama({ model: 'llama3.2' });

const rl = createInterface({ input: process.stdin, output: process.stdout });
console.log('πŸ€– Ask me anything about your docs (type "exit" to quit)\n');

const ask = () => {
  rl.question('> ', async (question) => {
    if (question === 'exit') { rl.close(); return; }

    // Find relevant chunks
    const results = await vectorStore.similaritySearch(question, 3);
    const context = results.map(r => 
      `[${r.metadata.source}]\n${r.pageContent}`
    ).join('\n\n');

    // Generate answer
    const response = await llm.invoke(
      `Answer this question using ONLY the provided documentation. 
If the answer isn't in the docs, say "I couldn't find this in the documentation."
Cite the source file.

Documentation:
${context}

Question: ${question}`
    );

    console.log(`\n${response.content}\n`);
    ask();
  });
};

ask();

Step 5: Run it

# Start ChromaDB (one-time)
docker run -p 8000:8000 chromadb/chroma

# Index your docs (run when docs change)
node index.js

# Start chatting
node chat.js

Why this matters

  • Privacy: Your docs never leave your machine. No API calls, no cloud storage.
  • Cost: $0 after the initial setup. Run it as much as you want.
  • Speed: Local inference with Llama 3.2 is fast enough for interactive chat.
  • Customizable: Swap models, adjust chunk sizes, add more document types.

Making it better

  • Add a web UI: Wrap the chat in an Express server with a simple HTML frontend
  • Auto-reindex: Watch the docs folder for changes and reindex automatically
  • Add Slack integration: Combine with the Slack bot tutorial to answer doc questions in Slack
  • Better chunking: Use markdown-aware splitting that respects headers and code blocks

Total build time: ~1 hour. Running cost: $0 (everything runs locally).

Previous: Build a CLI That Explains Error Messages Next: Build a GitHub PR Description Generator

Related: How To Setup Open Webui