May 1, 2026 · 8 min read

Poolside Laguna API Guide — OpenRouter, Direct API, and Code Examples (2026)

Poolside’s Laguna models are accessible through three API paths: OpenRouter (free), Amazon Bedrock, and Poolside’s direct API. All three expose OpenAI-compatible chat completions endpoints, so any tool or library that works with the OpenAI SDK works with Laguna.

This guide covers authentication, request formats, streaming, error handling, and integration with popular coding tools. Both Laguna M.1 (225B/23B active) and Laguna XS.2 (33B/3B active) are covered.

For background on the models, see our What is Poolside AI overview.

Quick start: OpenRouter (free)

OpenRouter is the fastest way to start. Both models are free — M.1 for a limited time, XS.2 with no announced end date.

Get an API key

Go to openrouter.ai
Create an account
Navigate to API Keys and generate a new key
Copy the key — you will not see it again

First request

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -d '{
    "model": "poolside/laguna-m.1",
    "messages": [
      {"role": "system", "content": "You are an expert software engineer."},
      {"role": "user", "content": "Write a Python async context manager for database connection pooling with proper cleanup."}
    ],
    "max_tokens": 2048,
    "temperature": 0.1
  }'

Replace poolside/laguna-m.1 with poolside/laguna-xs.2 for the smaller model.

Python SDK

import openai

client = openai.OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="your-openrouter-key"
)

# Laguna M.1 — flagship model
response = client.chat.completions.create(
    model="poolside/laguna-m.1",
    messages=[
        {"role": "system", "content": "You are an expert software engineer. Write clean, production-ready code."},
        {"role": "user", "content": "Implement a rate limiter using the token bucket algorithm in Go."}
    ],
    max_tokens=2048,
    temperature=0.1
)

print(response.choices[0].message.content)

# Laguna XS.2 — lightweight model
response = client.chat.completions.create(
    model="poolside/laguna-xs.2",
    messages=[
        {"role": "user", "content": "Write a TypeScript utility type that makes all nested properties optional."}
    ],
    max_tokens=1024,
    temperature=0.1
)

print(response.choices[0].message.content)

TypeScript / Node.js

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

async function generateCode(prompt: string): Promise<string> {
  const response = await client.chat.completions.create({
    model: "poolside/laguna-m.1",
    messages: [
      { role: "system", content: "You are an expert software engineer." },
      { role: "user", content: prompt },
    ],
    max_tokens: 2048,
    temperature: 0.1,
  });

  return response.choices[0].message.content ?? "";
}

const code = await generateCode(
  "Write an Express.js middleware that validates JWT tokens and attaches the decoded payload to the request."
);
console.log(code);

Streaming responses

For real-time output in CLI tools and chat interfaces, use streaming:

Python streaming

import openai

client = openai.OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="your-openrouter-key"
)

stream = client.chat.completions.create(
    model="poolside/laguna-m.1",
    messages=[
        {"role": "user", "content": "Write a Rust HTTP server using Axum with graceful shutdown handling."}
    ],
    max_tokens=2048,
    temperature=0.1,
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

print()  # Final newline

TypeScript streaming

const stream = await client.chat.completions.create({
  model: "poolside/laguna-m.1",
  messages: [
    {
      role: "user",
      content: "Write a React hook that manages WebSocket connections with automatic reconnection.",
    },
  ],
  max_tokens: 2048,
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
}

curl streaming

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -N \
  -d '{
    "model": "poolside/laguna-m.1",
    "messages": [
      {"role": "user", "content": "Write a Python script that monitors a directory for file changes and triggers a build."}
    ],
    "stream": true
  }'

Function calling / tool use

Laguna models support function calling through the standard OpenAI tool-use format:

import openai
import json

client = openai.OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="your-openrouter-key"
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read the contents of a file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {
                        "type": "string",
                        "description": "The file path to read"
                    }
                },
                "required": ["path"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "write_file",
            "description": "Write content to a file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {
                        "type": "string",
                        "description": "The file path to write"
                    },
                    "content": {
                        "type": "string",
                        "description": "The content to write"
                    }
                },
                "required": ["path", "content"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="poolside/laguna-m.1",
    messages=[
        {"role": "system", "content": "You are a coding assistant with access to file operations."},
        {"role": "user", "content": "Read the file src/utils.ts and add a debounce utility function to it."}
    ],
    tools=tools,
    tool_choice="auto"
)

# Handle tool calls
message = response.choices[0].message
if message.tool_calls:
    for tool_call in message.tool_calls:
        print(f"Tool: {tool_call.function.name}")
        print(f"Args: {tool_call.function.arguments}")

This makes Laguna models compatible with agent frameworks like LangChain, CrewAI, and the OpenAI Agents SDK.

Amazon Bedrock

For enterprise deployments where code must stay within your AWS account:

Setup

# Ensure you have AWS credentials configured
aws configure

# Check that Laguna models are available in your region
aws bedrock list-foundation-models \
  --query "modelSummaries[?contains(modelId, 'poolside')]" \
  --region us-east-1

Python with Bedrock

import boto3
import json

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

def generate_code(prompt: str, model_id: str = "poolside.laguna-m1-v1") -> str:
    response = bedrock.invoke_model(
        modelId=model_id,
        body=json.dumps({
            "messages": [
                {"role": "system", "content": "You are an expert software engineer."},
                {"role": "user", "content": prompt}
            ],
            "max_tokens": 2048,
            "temperature": 0.1
        })
    )
    
    result = json.loads(response["body"].read())
    return result["content"]

code = generate_code(
    "Write a DynamoDB data access layer in Python with proper error handling and retry logic."
)
print(code)

Bedrock streaming

import boto3
import json

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

response = bedrock.invoke_model_with_response_stream(
    modelId="poolside.laguna-m1-v1",
    body=json.dumps({
        "messages": [
            {"role": "user", "content": "Write a CloudFormation template for a VPC with public and private subnets."}
        ],
        "max_tokens": 4096
    })
)

for event in response["body"]:
    chunk = json.loads(event["chunk"]["bytes"])
    if "content" in chunk:
        print(chunk["content"], end="", flush=True)

Integration with coding tools

Aider

Aider supports custom OpenAI-compatible endpoints. Configure it for Laguna:

# Using OpenRouter
export OPENROUTER_API_KEY="your-key"
aider --model openrouter/poolside/laguna-m.1

# Using a local vLLM server running XS.2
aider --model openai/laguna-xs.2 --openai-api-base http://localhost:8000/v1

For Aider-specific configuration, add to ~/.aider.conf.yml:

model: openrouter/poolside/laguna-m.1
openrouter-api-key: your-key

Continue (VS Code / JetBrains)

Add Laguna to your Continue configuration:

{
  "models": [
    {
      "title": "Laguna M.1 (OpenRouter)",
      "provider": "openai",
      "model": "poolside/laguna-m.1",
      "apiBase": "https://openrouter.ai/api/v1",
      "apiKey": "your-openrouter-key"
    },
    {
      "title": "Laguna XS.2 (Local)",
      "provider": "openai",
      "model": "poolside/laguna-xs.2",
      "apiBase": "http://localhost:8000/v1",
      "apiKey": "not-needed"
    }
  ]
}

OpenCode

# OpenRouter
OPENAI_API_BASE=https://openrouter.ai/api/v1 \
OPENAI_API_KEY=your-openrouter-key \
opencode --model poolside/laguna-m.1

# Local
OPENAI_API_BASE=http://localhost:8000/v1 \
OPENAI_API_KEY=not-needed \
opencode --model laguna-xs.2

For more on OpenRouter setup and model routing, see our OpenRouter complete guide.

Error handling

Rate limits

OpenRouter has rate limits on free models. Handle them gracefully:

import openai
import time

client = openai.OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="your-openrouter-key"
)

def generate_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="poolside/laguna-m.1",
                messages=messages,
                max_tokens=2048,
                temperature=0.1
            )
            return response.choices[0].message.content
        except openai.RateLimitError:
            if attempt < max_retries - 1:
                wait = 2 ** attempt  # Exponential backoff
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
            else:
                raise
        except openai.APIError as e:
            print(f"API error: {e}")
            raise

result = generate_with_retry([
    {"role": "user", "content": "Write a connection pool manager in Java."}
])

Model fallback

If M.1 is unavailable or slow, fall back to XS.2:

def generate_with_fallback(messages):
    models = ["poolside/laguna-m.1", "poolside/laguna-xs.2"]
    
    for model in models:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=messages,
                max_tokens=2048,
                temperature=0.1,
                timeout=30
            )
            return response.choices[0].message.content
        except Exception as e:
            print(f"{model} failed: {e}")
            continue
    
    raise Exception("All models failed")

For more strategies on managing API costs and reliability, see our how to reduce LLM API costs guide.

Optimizing requests for coding

System prompts

Laguna models respond well to specific system prompts for coding:

# General coding
system = "You are an expert software engineer. Write clean, production-ready code with proper error handling."

# Language-specific
system = "You are an expert Python developer. Follow PEP 8, use type hints, and include docstrings."

# Review mode
system = "You are a senior code reviewer. Identify bugs, security issues, and performance problems. Be specific and actionable."

# Test generation
system = "You are a test engineer. Write comprehensive tests that cover edge cases, error conditions, and integration points."

Temperature settings

For coding tasks, keep temperature low:

Task	Temperature	Reason
Code generation	0.0 - 0.1	Deterministic, correct output
Bug fixing	0.0	Precision matters
Code review	0.1 - 0.2	Slight variation in suggestions
Test generation	0.1 - 0.2	Some creativity in test cases
Brainstorming approaches	0.3 - 0.5	Explore different solutions
Refactoring suggestions	0.1 - 0.3	Balance between convention and creativity

Context management

Laguna models are coding-specific, so structure your context accordingly:

messages = [
    {"role": "system", "content": "You are an expert TypeScript developer working on a Next.js application."},
    {"role": "user", "content": f"""Here is the current file:

```typescript
{file_content}

The error is:

{error_message}

Fix the bug and explain what caused it."""} ]


Include relevant code, error messages, and file paths. Laguna models are trained on code, so they handle code-heavy context well. Avoid including large amounts of prose — keep the context focused on code and technical details.

## Pricing summary

| Access method | M.1 pricing | XS.2 pricing |
|---|---|---|
| OpenRouter | Free (limited time) | Free |
| Amazon Bedrock | Pay per token (AWS pricing) | Pay per token (AWS pricing) |
| Direct API | Contact Poolside | Contact Poolside |
| Local (XS.2 only) | N/A | Free (Apache 2.0) |

The free OpenRouter access is the best starting point. If you need enterprise features (SLAs, data residency, compliance), use Bedrock. If you need full control and privacy, run XS.2 locally.

## FAQ

### Do I need an OpenRouter account to use Laguna for free?

Yes. OpenRouter requires a free account and API key. Registration takes under a minute. The API key is used for rate limiting and usage tracking, not billing — both Laguna models are free on OpenRouter.

### Is the OpenRouter free tier rate limited?

Yes. OpenRouter applies rate limits to free models. The exact limits vary and are not always published. If you hit rate limits, add exponential backoff retry logic (shown above) or upgrade to a paid OpenRouter plan for higher limits. For production workloads, consider Bedrock or the direct API.

### Can I use Laguna with LangChain?

Yes. LangChain supports any OpenAI-compatible endpoint. Configure it with the OpenRouter base URL and your API key. Both chat completions and function calling work through LangChain's standard interfaces. The same applies to LlamaIndex, CrewAI, and other frameworks that support the OpenAI API format.

### What is the context window for Laguna models?

Context window limits depend on the deployment. OpenRouter and Bedrock may impose their own limits. Check the model card on OpenRouter for the current context window. For local XS.2 deployment, you control the context window through your inference engine configuration (e.g., `--max-model-len` in vLLM).

### How do I switch between M.1 and XS.2 in my code?

Change the model string. Both models use the same API format, same endpoints, same request structure. Replace `poolside/laguna-m.1` with `poolside/laguna-xs.2` (or vice versa) and everything else stays the same. This makes it easy to implement fallback logic or A/B testing between the two models.

### Is my code sent to Poolside when using OpenRouter?

When using OpenRouter, your requests go through OpenRouter's infrastructure to Poolside's servers. OpenRouter's privacy policy applies. For maximum privacy, either use Amazon Bedrock (code stays in your AWS account) or run XS.2 locally (code never leaves your machine). Review OpenRouter's data handling policies if you are working with sensitive code.