Writing API tests by hand is tedious. You stare at an endpoint, think up a few happy-path cases, maybe remember to test a 401, and move on. Meanwhile, the weird edge cases — the ones that actually break production — slip through.
LLMs change this. Feed them your API spec and they’ll generate dozens of test cases in seconds, including edge cases you wouldn’t think of. This tutorial covers two practical approaches: using Postman’s built-in AI features and writing a custom Python script with Ollama to generate tests from an OpenAPI spec.
If you’re new to the concept, start with our overview of AI test generation first.
Postman’s Built-In AI Test Generation
Postman added AI-powered test generation in late 2025, and it’s now one of the fastest ways to get coverage on an existing API.
Here’s how to use it:
-
Import your OpenAPI spec — Go to File → Import and drop in your
openapi.yamlor paste a URL. Postman creates a collection with all your endpoints. -
Open any request and click “Generate Tests” — In the Tests tab, you’ll see an AI icon. Click it and Postman analyzes the request schema, parameters, and expected responses.
-
Review and customize — Postman generates
pm.test()blocks covering status codes, response schema validation, required fields, and type checks. You can edit these or regenerate with different prompts. -
Run the collection — Use the Collection Runner or Newman CLI to execute all generated tests at once.
What Postman’s AI does well:
- Schema validation tests from your response definitions
- Status code assertions for success and common error codes
- Header and content-type checks
- Basic boundary testing for numeric parameters
What it misses:
- Complex business logic validation
- Chained request dependencies (e.g., create → read → delete flows)
- Custom edge cases specific to your domain
For those gaps, you need something more flexible.
Generating Test Cases from OpenAPI Specs with Ollama
Running an LLM locally with Ollama gives you full control. You can feed it your entire OpenAPI spec and get back structured test cases — no API keys, no rate limits, no data leaving your machine.
The idea is simple: parse your OpenAPI spec, extract endpoint details, send them to a local LLM, and get back test cases as structured JSON.
Prerequisites
- Ollama installed with a model pulled (e.g.,
ollama pull llama3) - Python 3.10+
pip install pyyaml requests
The Script
#!/usr/bin/env python3
"""Generate API test cases from an OpenAPI spec using a local Ollama LLM."""
import json
import sys
import yaml
import requests
OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL = "llama3"
def load_spec(path: str) -> dict:
with open(path) as f:
return yaml.safe_load(f)
def extract_endpoints(spec: dict) -> list[dict]:
endpoints = []
for path, methods in spec.get("paths", {}).items():
for method, details in methods.items():
if method in ("get", "post", "put", "patch", "delete"):
endpoints.append({
"path": path,
"method": method.upper(),
"summary": details.get("summary", ""),
"parameters": details.get("parameters", []),
"requestBody": details.get("requestBody", {}),
"responses": {k: v.get("description", "") for k, v in details.get("responses", {}).items()},
})
return endpoints
def generate_tests(endpoint: dict) -> str:
prompt = f"""You are an API testing expert. Given this endpoint, generate test cases as a JSON array.
Each test case should have: "name", "method", "path", "headers" (object), "body" (object or null),
"expected_status" (int), and "category" (one of: happy_path, boundary, auth, malformed_input, edge_case).
Include at least:
- 2 happy path tests
- 2 boundary value tests
- 1 missing/invalid auth test
- 2 malformed input tests
- 1 edge case
Endpoint:
{json.dumps(endpoint, indent=2)}
Return ONLY valid JSON array, no explanation."""
resp = requests.post(OLLAMA_URL, json={
"model": MODEL,
"prompt": prompt,
"stream": False,
"options": {"temperature": 0.3},
})
return resp.json()["response"]
def parse_tests(raw: str) -> list[dict]:
# Extract JSON array from LLM response
start = raw.find("[")
end = raw.rfind("]") + 1
if start == -1 or end == 0:
return []
return json.loads(raw[start:end])
def main():
if len(sys.argv) < 2:
print("Usage: python generate_api_tests.py <openapi_spec.yaml>")
sys.exit(1)
spec = load_spec(sys.argv[1])
all_tests = []
for endpoint in extract_endpoints(spec):
print(f"Generating tests for {endpoint['method']} {endpoint['path']}...")
raw = generate_tests(endpoint)
tests = parse_tests(raw)
all_tests.extend(tests)
print(f" → {len(tests)} test cases generated")
output = "generated_tests.json"
with open(output, "w") as f:
json.dump(all_tests, f, indent=2)
print(f"\nTotal: {len(all_tests)} test cases saved to {output}")
if __name__ == "__main__":
main()
Running It
ollama pull llama3
python generate_api_tests.py petstore.yaml
Output looks like this:
Generating tests for GET /pets...
→ 8 test cases generated
Generating tests for POST /pets...
→ 8 test cases generated
Generating tests for GET /pets/{petId}...
→ 8 test cases generated
Total: 24 test cases saved to generated_tests.json
The generated generated_tests.json contains structured test cases you can feed into any test runner.
Edge Case Discovery: What the LLM Catches
The real value isn’t the happy-path tests — it’s the edge cases. Here’s what a well-prompted LLM typically generates that humans skip:
Boundary values:
- Integer fields set to 0, -1,
MAX_INT - Strings at exactly the max length, one over, and empty
- Arrays with 0 items, 1 item, and thousands of items
Auth failures:
- Missing
Authorizationheader entirely - Expired tokens
- Valid token but insufficient permissions
- Malformed Bearer format (
Bearer,Bearer,Bearer xyz.abc)
Malformed input:
- Wrong
Content-Typeheader (sendingtext/plainto a JSON endpoint) - Valid JSON structure but wrong field types (string where int expected)
- Extra unexpected fields
- Unicode and special characters in string fields
- Nested objects where flat values are expected
Edge cases:
- Concurrent requests to the same resource
- Requests with extremely large payloads
- Path traversal attempts in ID parameters
- SQL injection patterns in query parameters
For a deeper dive on handling these gracefully on the server side, see our guide on API error handling.
Integrating into CI
Generated tests are only useful if they run automatically. Here’s how to wire this into a CI pipeline:
# .github/workflows/api-tests.yml
name: AI-Generated API Tests
on:
push:
paths: ['openapi.yaml']
jobs:
generate-and-run:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Ollama
run: curl -fsSL https://ollama.ai/install.sh | sh
- name: Pull model
run: ollama pull llama3
- name: Generate test cases
run: python generate_api_tests.py openapi.yaml
- name: Run tests against staging
run: python run_tests.py --base-url ${{ vars.STAGING_URL }} --input generated_tests.json
The trigger is key: regenerate tests whenever the OpenAPI spec changes. This catches regressions from spec changes and ensures new endpoints get test coverage immediately.
A lightweight run_tests.py runner just iterates over the JSON, fires each request with requests, and asserts the status code matches expected_status. You can extend it to validate response bodies, measure latency, or flag flaky tests.
Tips for Better Results
- Be specific in prompts — Tell the LLM about your auth scheme, rate limits, and business rules. Generic prompts produce generic tests.
- Use lower temperature (0.2–0.4) — You want deterministic, structured output, not creative writing.
- Validate the JSON — LLMs occasionally produce malformed output. The
parse_testsfunction handles this, but add retries for production use. - Combine approaches — Use Postman’s AI for quick schema validation tests, then Ollama for deeper edge case generation.
- Review before trusting — AI-generated tests are a starting point. Review them for false assumptions about your business logic.
Following solid API design best practices makes AI-generated tests more accurate, since the LLM can reason better about well-structured specs.
Wrapping Up
AI won’t replace thoughtful test design, but it eliminates the grunt work. Postman’s built-in features get you basic coverage in minutes. A custom Ollama script gives you full control over prompt engineering and output format, runs entirely locally, and integrates cleanly into CI.
Start with your most critical endpoints. Generate tests, review them, fill in the business-logic gaps manually, and let the LLM handle the repetitive boundary and malformed-input cases it’s good at. Your API coverage will improve faster than writing every test by hand.