๐Ÿ“ Tutorials
ยท 6 min read

Build an AI Database Query Assistant โ€” Natural Language to SQL


You know your database schema. You know what data you want. But sometimes the SQL just doesnโ€™t come out right โ€” especially with joins across five tables you havenโ€™t touched in months.

What if you could just ask? โ€œShow me all orders from last week with customer namesโ€ and get the exact SQL back, ready to run.

Thatโ€™s what weโ€™re building today: a local text-to-SQL assistant powered by Ollama. No API keys, no cloud services โ€” your schema stays on your machine.

What Weโ€™re Building

A Python CLI tool that:

  1. Reads your database schema (from a live SQLite database)
  2. Takes a natural language question
  3. Sends both to a local LLM via Ollama
  4. Returns the generated SQL query
  5. Optionally executes it and shows results

The whole thing runs in about 150 lines of Python.

Prerequisites

ollama pull codellama

Install the one dependency:

pip install requests

Step 1 โ€” Create a Sample Database

We need something to query against. Letโ€™s build a small e-commerce database in SQLite โ€” the kind of schema youโ€™d actually encounter in the real world.

# setup_db.py
import sqlite3

def create_sample_db(path="shop.db"):
    conn = sqlite3.connect(path)
    c = conn.cursor()

    c.executescript("""
        CREATE TABLE IF NOT EXISTS customers (
            id INTEGER PRIMARY KEY,
            name TEXT NOT NULL,
            email TEXT UNIQUE,
            created_at DATE DEFAULT CURRENT_DATE
        );

        CREATE TABLE IF NOT EXISTS products (
            id INTEGER PRIMARY KEY,
            name TEXT NOT NULL,
            price REAL NOT NULL,
            category TEXT
        );

        CREATE TABLE IF NOT EXISTS orders (
            id INTEGER PRIMARY KEY,
            customer_id INTEGER REFERENCES customers(id),
            product_id INTEGER REFERENCES products(id),
            quantity INTEGER DEFAULT 1,
            order_date DATE DEFAULT CURRENT_DATE
        );

        INSERT OR IGNORE INTO customers (id, name, email) VALUES
            (1, 'Alice Martin', 'alice@example.com'),
            (2, 'Bob Chen', 'bob@example.com'),
            (3, 'Carol Davis', 'carol@example.com');

        INSERT OR IGNORE INTO products (id, name, price, category) VALUES
            (1, 'Mechanical Keyboard', 89.99, 'electronics'),
            (2, 'Desk Lamp', 34.50, 'office'),
            (3, 'USB-C Hub', 45.00, 'electronics'),
            (4, 'Notebook Pack', 12.99, 'office');

        INSERT OR IGNORE INTO orders (id, customer_id, product_id, quantity, order_date) VALUES
            (1, 1, 1, 1, '2026-06-01'),
            (2, 1, 3, 2, '2026-06-03'),
            (3, 2, 2, 1, '2026-06-05'),
            (4, 3, 4, 3, '2026-06-07'),
            (5, 2, 1, 1, '2026-06-09');
    """)

    conn.commit()
    conn.close()

if __name__ == "__main__":
    create_sample_db()
    print("Sample database created: shop.db")

Run it:

python setup_db.py

Step 2 โ€” Extract the Schema

The LLM needs to understand your tables to write correct SQL. Weโ€™ll pull the schema directly from SQLiteโ€™s metadata โ€” this way it always reflects the actual database, not some stale documentation.

# schema.py
import sqlite3

def get_schema(db_path="shop.db"):
    conn = sqlite3.connect(db_path)
    c = conn.cursor()

    c.execute("SELECT name FROM sqlite_master WHERE type='table' AND name NOT LIKE 'sqlite_%'")
    tables = [row[0] for row in c.fetchall()]

    schema_parts = []
    for table in tables:
        c.execute(f"SELECT sql FROM sqlite_master WHERE name='{table}'")
        create_stmt = c.fetchone()[0]
        schema_parts.append(create_stmt + ";")

    conn.close()
    return "\n\n".join(schema_parts)

This gives us the raw CREATE TABLE statements โ€” exactly what an LLM needs to understand column names, types, and relationships.

Step 3 โ€” Build the Query Assistant

Hereโ€™s the core of the tool. We send the schema plus the userโ€™s question to Ollama with a carefully crafted system prompt that keeps the model focused on SQL output.

# text_to_sql.py
import sqlite3
import requests
import json
from schema import get_schema

OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL = "codellama"

SYSTEM_PROMPT = """You are a SQL query generator. You will receive a database schema and a natural language question.

Rules:
- Output ONLY the SQL query, nothing else
- No explanations, no markdown, no code fences
- Use only tables and columns from the provided schema
- Write standard SQL compatible with SQLite"""

def generate_sql(question, db_path="shop.db"):
    schema = get_schema(db_path)

    prompt = f"""Database schema:
{schema}

Question: {question}

SQL query:"""

    response = requests.post(OLLAMA_URL, json={
        "model": MODEL,
        "prompt": prompt,
        "system": SYSTEM_PROMPT,
        "stream": False,
        "options": {"temperature": 0.1}
    })

    return response.json()["response"].strip()

def execute_query(sql, db_path="shop.db"):
    conn = sqlite3.connect(db_path)
    conn.row_factory = sqlite3.Row
    try:
        rows = conn.execute(sql).fetchall()
        if rows:
            columns = rows[0].keys()
            return columns, [dict(row) for row in rows]
        return [], []
    except sqlite3.Error as e:
        return None, str(e)
    finally:
        conn.close()

def format_results(columns, rows):
    if columns is None:
        return f"SQL Error: {rows}"
    if not rows:
        return "No results."

    widths = {col: max(len(col), max(len(str(r[col])) for r in rows)) for col in columns}
    header = " | ".join(col.ljust(widths[col]) for col in columns)
    separator = "-+-".join("-" * widths[col] for col in columns)
    lines = [header, separator]
    for row in rows:
        lines.append(" | ".join(str(row[col]).ljust(widths[col]) for col in columns))
    return "\n".join(lines)

A few things to note:

  • Temperature 0.1 โ€” we want deterministic, precise SQL, not creative writing. Low temperature keeps the output focused.
  • stream: False โ€” simpler to handle. For a production tool you might want streaming, but for SQL generation the full response is usually short.
  • row_factory = sqlite3.Row โ€” gives us column names in results, which makes the output much more readable.

Step 4 โ€” The Interactive CLI

Letโ€™s wrap it in a REPL so you can have a conversation with your database:

# main.py
from text_to_sql import generate_sql, execute_query, format_results

def main():
    db_path = "shop.db"
    print("Text-to-SQL Assistant (type 'quit' to exit)")
    print("=" * 45)

    while True:
        question = input("\nQuestion: ").strip()
        if question.lower() in ("quit", "exit", "q"):
            break
        if not question:
            continue

        print("\nGenerating SQL...")
        sql = generate_sql(question, db_path)
        print(f"\n  {sql}")

        run = input("\nExecute? [Y/n] ").strip().lower()
        if run in ("", "y", "yes"):
            columns, rows = execute_query(sql, db_path)
            print(f"\n{format_results(columns, rows)}")

if __name__ == "__main__":
    main()

Try It Out

python main.py

Hereโ€™s what a session looks like:

Text-to-SQL Assistant (type 'quit' to exit)
=============================================

Question: Show all orders with customer names and product names

  SELECT c.name AS customer, p.name AS product, o.quantity, o.order_date
  FROM orders o
  JOIN customers c ON o.customer_id = c.id
  JOIN products p ON o.product_id = p.id;

Execute? [Y/n] y

customer     | product             | quantity | order_date
-------------+---------------------+----------+-----------
Alice Martin | Mechanical Keyboard | 1        | 2026-06-01
Alice Martin | USB-C Hub           | 2        | 2026-06-03
Bob Chen     | Desk Lamp           | 1        | 2026-06-05
Carol Davis  | Notebook Pack       | 3        | 2026-06-07
Bob Chen     | Mechanical Keyboard | 1        | 2026-06-09

Try more questions:

  • โ€œWhatโ€™s the total revenue per product category?โ€
  • โ€œWhich customer has spent the most money?โ€
  • โ€œList products that have never been orderedโ€

Tips for Better Results

Pick the right model. codellama is purpose-built for code generation and handles SQL well. mistral is a solid general-purpose alternative. Larger models (13B+) handle complex joins and subqueries more reliably โ€” check the best models for coding locally for current recommendations.

Schema detail matters. The more context in your CREATE TABLE statements โ€” foreign keys, constraints, column comments โ€” the better the generated SQL. If your schema uses cryptic column names like col_a3, consider adding comments to the prompt.

Add sample rows. For tricky schemas, include a few example rows in the prompt. This helps the model understand what the data actually looks like, not just the structure.

Validate before executing. This tool lets you review the SQL before running it. In a production setting, youโ€™d want to add guardrails โ€” restrict to SELECT statements, use a read-only database connection, or run against a replica.

Adapting for PostgreSQL

The same approach works with PostgreSQL โ€” swap sqlite3 for psycopg2 and pull the schema from information_schema instead. The PostgreSQL cheat sheet covers the metadata queries youโ€™d need.

Where to Go Next

This is a focused single-query tool. To make it smarter:

  • Add conversation memory so follow-up questions work (โ€œnow filter that by last weekโ€)
  • Combine with RAG to include documentation or business rules alongside the schema โ€” the local RAG pipeline tutorial shows how
  • Support multiple databases by loading schema from a config file
  • Add query explanation โ€” ask the model to explain the generated SQL in plain English

The core pattern here โ€” structured context + natural language question โ†’ structured output โ€” applies far beyond SQL. Itโ€™s the same approach behind code generation, API call builders, and config file generators. Once you have it working for your database, youโ€™ll find plenty of other places to use it.

๐Ÿ“˜