πŸ“ Tutorials
Β· 5 min read

Generate E2E Tests with AI β€” Playwright + Ollama Tutorial (2026)


Writing end-to-end tests is one of those tasks that everyone agrees is important and nobody wants to do. The selectors, the async waits, the boilerplate β€” it adds up fast. What if you could describe a user flow in plain English and get a working Playwright .spec.ts file back?

That’s exactly what we’re building today. A Python script that takes a natural language description of what your app should do, sends it to a local Ollama model, and saves the generated Playwright test code to a file. No cloud APIs, no tokens, no billing surprises. Everything runs on your machine.

If you’re new to AI-powered test generation, start there for the conceptual overview. We covered unit test generation with Ollama previously β€” this tutorial extends that idea to full browser-based E2E tests.

Prerequisites

You need three things installed:

  • Python 3.10+ with requests (pip install requests)
  • Ollama running locally with a code-capable model pulled β€” see our Ollama complete guide for setup
  • Node.js + Playwright in the project where you’ll run the generated tests (npm init playwright@latest)

For the model, codellama:13b or deepseek-coder:6.7b both work well. Check our best AI models for coding locally roundup if you want to compare options. Pull one now:

ollama pull deepseek-coder:6.7b

The Prompt Template

The quality of generated tests lives or dies with the prompt. After testing dozens of variations, here’s what consistently produces valid Playwright output:

PROMPT_TEMPLATE = """You are an expert QA engineer. Generate a Playwright test file in TypeScript.

Rules:
- Use @playwright/test import with test and expect
- Use async arrow functions for each test
- Use page.goto() with the provided base URL
- Use recommended locators: getByRole, getByText, getByLabel, getByPlaceholder
- Add await for every Playwright call
- Add meaningful assertions with expect()
- Do NOT add comments or explanations, output ONLY the code
- Wrap related steps in test.describe()

Base URL: {url}

User flow to test:
{description}

Respond with ONLY the TypeScript code, no markdown fences."""

Key design choices here: we explicitly ban comments and markdown fences because models love to wrap code in triple backticks, which breaks file output. We push toward semantic locators (getByRole, getByText) instead of fragile CSS selectors. And we specify the import style so the output is immediately runnable.

The Complete Script

Here’s generate_e2e.py β€” the full working pipeline:

#!/usr/bin/env python3
"""Generate Playwright E2E tests from natural language using Ollama."""

import argparse
import re
import requests
import sys
from pathlib import Path

OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL = "deepseek-coder:6.7b"

PROMPT_TEMPLATE = """You are an expert QA engineer. Generate a Playwright test file in TypeScript.

Rules:
- Use @playwright/test import with test and expect
- Use async arrow functions for each test
- Use page.goto() with the provided base URL
- Use recommended locators: getByRole, getByText, getByLabel, getByPlaceholder
- Add await for every Playwright call
- Add meaningful assertions with expect()
- Do NOT add comments or explanations, output ONLY the code
- Wrap related steps in test.describe()

Base URL: {url}

User flow to test:
{description}

Respond with ONLY the TypeScript code, no markdown fences."""


def query_ollama(prompt: str) -> str:
    """Send prompt to Ollama and return the full response."""
    resp = requests.post(OLLAMA_URL, json={
        "model": MODEL,
        "prompt": prompt,
        "stream": False,
        "options": {"temperature": 0.2, "num_predict": 2048}
    })
    resp.raise_for_status()
    return resp.json()["response"]


def clean_output(raw: str) -> str:
    """Strip markdown fences and leading/trailing whitespace."""
    cleaned = re.sub(r"^```(?:typescript|ts)?\n?", "", raw.strip())
    cleaned = re.sub(r"\n?```$", "", cleaned)
    return cleaned.strip()


def generate_test(description: str, url: str, output: Path) -> None:
    """Generate a Playwright test and save it."""
    prompt = PROMPT_TEMPLATE.format(url=url, description=description)
    print(f"Generating test with {MODEL}...")

    raw = query_ollama(prompt)
    code = clean_output(raw)

    if "import" not in code:
        code = 'import { test, expect } from "@playwright/test";\n\n' + code

    output.parent.mkdir(parents=True, exist_ok=True)
    output.write_text(code)
    print(f"Saved to {output}")


def main():
    parser = argparse.ArgumentParser(description="Generate Playwright E2E tests with AI")
    parser.add_argument("description", help="Natural language description of the user flow")
    parser.add_argument("--url", required=True, help="Base URL of the app under test")
    parser.add_argument("--output", "-o", default="tests/generated.spec.ts",
                        help="Output file path (default: tests/generated.spec.ts)")
    parser.add_argument("--model", "-m", default=MODEL, help=f"Ollama model (default: {MODEL})")
    args = parser.parse_args()

    global MODEL
    MODEL = args.model

    generate_test(args.description, args.url, Path(args.output))


if __name__ == "__main__":
    main()

That’s it. Under 70 lines of actual logic.

Running It

Let’s generate a test for a login flow:

python generate_e2e.py \
  "User visits the login page, enters email and password, clicks Sign In, \
   and is redirected to the dashboard where they see a welcome message" \
  --url http://localhost:3000 \
  --output tests/login.spec.ts

The script hits Ollama, gets TypeScript back, cleans it up, and writes tests/login.spec.ts. Here’s typical output:

import { test, expect } from "@playwright/test";

test.describe("Login flow", () => {
  test("should log in and see welcome message", async ({ page }) => {
    await page.goto("http://localhost:3000/login");
    await page.getByLabel("Email").fill("user@example.com");
    await page.getByLabel("Password").fill("password123");
    await page.getByRole("button", { name: "Sign In" }).click();
    await expect(page).toHaveURL(/dashboard/);
    await expect(page.getByText("Welcome")).toBeVisible();
  });
});

Run it with Playwright:

npx playwright test tests/login.spec.ts

You’ll almost certainly need to tweak selectors to match your actual UI β€” the model guesses based on your description. But the structure, imports, async/await patterns, and assertion style are correct out of the box, which is where most of the time savings come from.

Tips for Better Results

Be specific in your descriptions. β€œUser logs in” produces vague tests. β€œUser clicks the Email field, types test@mail.com, clicks the Password field, types secret123, clicks the Sign In button, and sees β€˜Dashboard’ in the heading” produces precise ones. The more detail you give, the closer the output matches your real UI.

Lower the temperature. We set 0.2 in the script. For code generation, you want deterministic output. Bumping it higher introduces creative but broken syntax.

Use a bigger model if you can. The 6.7B parameter models handle simple flows well. For multi-page flows with conditional logic, step up to 13B or 34B. The tradeoff is speed β€” a 34B model on CPU is slow.

Chain multiple calls. Instead of describing an entire 10-step flow in one prompt, break it into logical groups: login, navigation, form submission. Generate separate spec files and compose them.

Extending the Script

A few ideas to take this further:

  • Read descriptions from a YAML file β€” define all your flows in one place, generate all specs in a batch
  • Add a --dry-run flag that prints the generated code to stdout instead of saving
  • Pipe in HTML from your app alongside the description so the model can reference actual element names
  • Post-process with eslint --fix to catch formatting issues automatically

For a broader look at tools in this space, see our best AI testing tools roundup β€” some commercial products do similar things with more polish, but this local pipeline gives you full control and zero ongoing cost.

Wrapping Up

The script we built is simple on purpose. The real value is in the prompt template β€” it constrains the model to produce idiomatic Playwright code that actually runs. You describe what should happen, the model handles the boilerplate, and you refine the selectors to match your app.

This pairs well with the unit test generation workflow from earlier in this series. Use that for your functions and business logic, use this for your user-facing flows, and you’ve got a solid AI-assisted testing pipeline running entirely on your own hardware.