📝 Tutorials
· 6 min read

How to Use AI for API Testing — Generate Test Cases with LLMs (2026)


Writing API tests by hand is tedious. You stare at an endpoint, think up a few happy-path cases, maybe remember to test a 401, and move on. Meanwhile, the weird edge cases — the ones that actually break production — slip through.

LLMs change this. Feed them your API spec and they’ll generate dozens of test cases in seconds, including edge cases you wouldn’t think of. This tutorial covers two practical approaches: using Postman’s built-in AI features and writing a custom Python script with Ollama to generate tests from an OpenAPI spec.

If you’re new to the concept, start with our overview of AI test generation first.

Postman’s Built-In AI Test Generation

Postman added AI-powered test generation in late 2025, and it’s now one of the fastest ways to get coverage on an existing API.

Here’s how to use it:

  1. Import your OpenAPI spec — Go to File → Import and drop in your openapi.yaml or paste a URL. Postman creates a collection with all your endpoints.

  2. Open any request and click “Generate Tests” — In the Tests tab, you’ll see an AI icon. Click it and Postman analyzes the request schema, parameters, and expected responses.

  3. Review and customize — Postman generates pm.test() blocks covering status codes, response schema validation, required fields, and type checks. You can edit these or regenerate with different prompts.

  4. Run the collection — Use the Collection Runner or Newman CLI to execute all generated tests at once.

What Postman’s AI does well:

  • Schema validation tests from your response definitions
  • Status code assertions for success and common error codes
  • Header and content-type checks
  • Basic boundary testing for numeric parameters

What it misses:

  • Complex business logic validation
  • Chained request dependencies (e.g., create → read → delete flows)
  • Custom edge cases specific to your domain

For those gaps, you need something more flexible.

Generating Test Cases from OpenAPI Specs with Ollama

Running an LLM locally with Ollama gives you full control. You can feed it your entire OpenAPI spec and get back structured test cases — no API keys, no rate limits, no data leaving your machine.

The idea is simple: parse your OpenAPI spec, extract endpoint details, send them to a local LLM, and get back test cases as structured JSON.

Prerequisites

  • Ollama installed with a model pulled (e.g., ollama pull llama3)
  • Python 3.10+
  • pip install pyyaml requests

The Script

#!/usr/bin/env python3
"""Generate API test cases from an OpenAPI spec using a local Ollama LLM."""

import json
import sys
import yaml
import requests

OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL = "llama3"


def load_spec(path: str) -> dict:
    with open(path) as f:
        return yaml.safe_load(f)


def extract_endpoints(spec: dict) -> list[dict]:
    endpoints = []
    for path, methods in spec.get("paths", {}).items():
        for method, details in methods.items():
            if method in ("get", "post", "put", "patch", "delete"):
                endpoints.append({
                    "path": path,
                    "method": method.upper(),
                    "summary": details.get("summary", ""),
                    "parameters": details.get("parameters", []),
                    "requestBody": details.get("requestBody", {}),
                    "responses": {k: v.get("description", "") for k, v in details.get("responses", {}).items()},
                })
    return endpoints


def generate_tests(endpoint: dict) -> str:
    prompt = f"""You are an API testing expert. Given this endpoint, generate test cases as a JSON array.
Each test case should have: "name", "method", "path", "headers" (object), "body" (object or null),
"expected_status" (int), and "category" (one of: happy_path, boundary, auth, malformed_input, edge_case).

Include at least:
- 2 happy path tests
- 2 boundary value tests
- 1 missing/invalid auth test
- 2 malformed input tests
- 1 edge case

Endpoint:
{json.dumps(endpoint, indent=2)}

Return ONLY valid JSON array, no explanation."""

    resp = requests.post(OLLAMA_URL, json={
        "model": MODEL,
        "prompt": prompt,
        "stream": False,
        "options": {"temperature": 0.3},
    })
    return resp.json()["response"]


def parse_tests(raw: str) -> list[dict]:
    # Extract JSON array from LLM response
    start = raw.find("[")
    end = raw.rfind("]") + 1
    if start == -1 or end == 0:
        return []
    return json.loads(raw[start:end])


def main():
    if len(sys.argv) < 2:
        print("Usage: python generate_api_tests.py <openapi_spec.yaml>")
        sys.exit(1)

    spec = load_spec(sys.argv[1])
    all_tests = []

    for endpoint in extract_endpoints(spec):
        print(f"Generating tests for {endpoint['method']} {endpoint['path']}...")
        raw = generate_tests(endpoint)
        tests = parse_tests(raw)
        all_tests.extend(tests)
        print(f"  → {len(tests)} test cases generated")

    output = "generated_tests.json"
    with open(output, "w") as f:
        json.dump(all_tests, f, indent=2)

    print(f"\nTotal: {len(all_tests)} test cases saved to {output}")


if __name__ == "__main__":
    main()

Running It

ollama pull llama3
python generate_api_tests.py petstore.yaml

Output looks like this:

Generating tests for GET /pets...
  → 8 test cases generated
Generating tests for POST /pets...
  → 8 test cases generated
Generating tests for GET /pets/{petId}...
  → 8 test cases generated

Total: 24 test cases saved to generated_tests.json

The generated generated_tests.json contains structured test cases you can feed into any test runner.

Edge Case Discovery: What the LLM Catches

The real value isn’t the happy-path tests — it’s the edge cases. Here’s what a well-prompted LLM typically generates that humans skip:

Boundary values:

  • Integer fields set to 0, -1, MAX_INT
  • Strings at exactly the max length, one over, and empty
  • Arrays with 0 items, 1 item, and thousands of items

Auth failures:

  • Missing Authorization header entirely
  • Expired tokens
  • Valid token but insufficient permissions
  • Malformed Bearer format (Bearer, Bearer , Bearer xyz.abc)

Malformed input:

  • Wrong Content-Type header (sending text/plain to a JSON endpoint)
  • Valid JSON structure but wrong field types (string where int expected)
  • Extra unexpected fields
  • Unicode and special characters in string fields
  • Nested objects where flat values are expected

Edge cases:

  • Concurrent requests to the same resource
  • Requests with extremely large payloads
  • Path traversal attempts in ID parameters
  • SQL injection patterns in query parameters

For a deeper dive on handling these gracefully on the server side, see our guide on API error handling.

Integrating into CI

Generated tests are only useful if they run automatically. Here’s how to wire this into a CI pipeline:

# .github/workflows/api-tests.yml
name: AI-Generated API Tests
on:
  push:
    paths: ['openapi.yaml']

jobs:
  generate-and-run:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Ollama
        run: curl -fsSL https://ollama.ai/install.sh | sh

      - name: Pull model
        run: ollama pull llama3

      - name: Generate test cases
        run: python generate_api_tests.py openapi.yaml

      - name: Run tests against staging
        run: python run_tests.py --base-url ${{ vars.STAGING_URL }} --input generated_tests.json

The trigger is key: regenerate tests whenever the OpenAPI spec changes. This catches regressions from spec changes and ensures new endpoints get test coverage immediately.

A lightweight run_tests.py runner just iterates over the JSON, fires each request with requests, and asserts the status code matches expected_status. You can extend it to validate response bodies, measure latency, or flag flaky tests.

Tips for Better Results

  • Be specific in prompts — Tell the LLM about your auth scheme, rate limits, and business rules. Generic prompts produce generic tests.
  • Use lower temperature (0.2–0.4) — You want deterministic, structured output, not creative writing.
  • Validate the JSON — LLMs occasionally produce malformed output. The parse_tests function handles this, but add retries for production use.
  • Combine approaches — Use Postman’s AI for quick schema validation tests, then Ollama for deeper edge case generation.
  • Review before trusting — AI-generated tests are a starting point. Review them for false assumptions about your business logic.

Following solid API design best practices makes AI-generated tests more accurate, since the LLM can reason better about well-structured specs.

Wrapping Up

AI won’t replace thoughtful test design, but it eliminates the grunt work. Postman’s built-in features get you basic coverage in minutes. A custom Ollama script gives you full control over prompt engineering and output format, runs entirely locally, and integrates cleanly into CI.

Start with your most critical endpoints. Generate tests, review them, fill in the business-logic gaps manually, and let the LLM handle the repetitive boundary and malformed-input cases it’s good at. Your API coverage will improve faster than writing every test by hand.