🤖 AI Tools
· 8 min read

Poolside Laguna M.1 Complete Guide — 225B Coding Model (2026)


Laguna M.1 is Poolside AI’s flagship coding model. It has 225 billion total parameters with 23 billion active per forward pass, using a Mixture-of-Experts (MoE) architecture. It is trained exclusively on code using RLCEF — Reinforcement Learning from Code Execution Feedback — which means the model learned from actually running the code it generates, not just reading static repositories.

Right now, M.1 is free on OpenRouter for a limited time. It is also available on Amazon Bedrock and through Poolside’s direct API. This is the model that powers Poolside’s own products — pool (their terminal agent) and Shimmer (their cloud dev environment).

Here is the complete breakdown: architecture, specifications, how it compares to other coding models, and how to start using it.

Architecture and specifications

Laguna M.1 uses a Mixture-of-Experts architecture. The model contains 225B total parameters distributed across multiple expert networks, but only 23B parameters activate for any given token. A routing mechanism selects which experts to engage based on the input, keeping inference costs proportional to the active parameter count rather than the total.

SpecValue
Total parameters225B
Active parameters23B
ArchitectureMixture-of-Experts (MoE)
Training methodRLCEF (Reinforcement Learning from Code Execution Feedback)
Training focusCode only (not general purpose)
LicenseProprietary
OpenRouterFree (limited time)
Amazon BedrockAvailable
Direct APIAvailable
Weights downloadNot available

The MoE design is the same approach used by models like Mixtral, DeepSeek V3, and Qwen’s MoE variants. The key difference is that M.1 is trained exclusively on code with execution feedback, while those models are general-purpose.

Why MoE matters for coding

MoE architectures let you pack more knowledge into a model without proportionally increasing inference cost. For a coding model, this means M.1 can have deep knowledge of many programming languages, frameworks, and patterns (stored across 225B parameters) while keeping response times fast (only 23B parameters compute each token).

The 23B active parameter count puts M.1’s inference cost in the same ballpark as models like Codestral (22B) or Mistral Medium, but with the knowledge capacity of a much larger model. This is particularly valuable for coding tasks that require broad knowledge — understanding how a React frontend interacts with a Python backend through a GraphQL API, for example.

RLCEF training

M.1’s training pipeline is what sets it apart from other coding models. For background on Poolside’s approach, see our What is Poolside AI overview.

The short version: during training, M.1 generates code, that code is executed in sandboxed environments, and the execution results (pass/fail, output correctness, error messages) are used as reward signals. This creates a feedback loop where the model learns not just what code looks like syntactically, but what code does when it runs.

The practical effects of RLCEF training on M.1:

  • Higher first-pass correctness: Code generated by M.1 is more likely to run without errors on the first attempt. The model has been trained on millions of execution cycles and has learned common failure modes.
  • Better error handling: Because M.1 has seen what happens when error handling is missing or incorrect, it tends to generate more robust code with proper try/catch blocks, input validation, and edge case handling.
  • Stronger debugging: M.1 can look at an error message and stack trace and identify the root cause more reliably, because it has seen the relationship between code patterns and execution errors during training.
  • Test awareness: M.1 generates code that is more testable and can write tests that actually catch real bugs, because it has been trained on the relationship between code and test outcomes.

How M.1 compares to other coding models

vs. Claude Sonnet / Opus for coding

Claude models are general-purpose — they handle creative writing, analysis, math, and coding. M.1 is coding-only. For pure coding tasks, M.1’s RLCEF training gives it an edge in first-pass correctness and debugging. For tasks that mix coding with explanation, documentation, or architectural reasoning in natural language, Claude’s broader training may be advantageous. Claude also has larger context windows (200K) which helps with large codebase analysis.

vs. GPT-5 for coding

Similar tradeoff as Claude. GPT-5 is a frontier general-purpose model with strong coding capabilities. M.1 trades breadth for depth — it cannot write poetry, but it may generate more reliable code. GPT-5’s tool calling and structured output capabilities are more mature, which matters for agent-based workflows.

vs. DeepSeek V4 for coding

DeepSeek V4 is also an MoE model with strong coding performance, but it is trained as a general-purpose model. M.1’s coding-only training with RLCEF is a different approach. DeepSeek V4 has the advantage of being open-weight and having a massive community. M.1 has the advantage of specialized training.

vs. Devstral 2

Devstral 2 is Mistral’s coding-focused model. Like M.1, it targets software development specifically. The key difference is training methodology — Devstral uses standard supervised fine-tuning and RLHF, while M.1 uses RLCEF with actual code execution. For a detailed look at Devstral, see our Devstral 2 complete guide.

vs. Mistral Medium 3.5

Mistral Medium 3.5 is a strong general-purpose model with good coding capabilities. M.1 has more total parameters (225B vs Mistral Medium’s architecture) and is specifically trained for code. For coding-heavy workloads, M.1 should outperform. For mixed workloads that include documentation, planning, and communication alongside coding, Mistral Medium 3.5 may be more versatile. See our Mistral Medium 3.5 guide for details.

vs. Laguna XS.2

XS.2 is Poolside’s smaller model — 33B total, 3B active, Apache 2.0. M.1 is significantly more capable for complex tasks: multi-file refactoring, architectural decisions, debugging subtle issues. XS.2 is better for quick completions, simple function generation, and local deployment. If you are choosing between them, try M.1 for free on OpenRouter first, then fall back to XS.2 for tasks where the extra capability is not needed.

Best use cases for M.1

M.1 excels at tasks where coding-specific training and RLCEF provide the most value:

Complex refactoring. Restructuring code across multiple files while maintaining correctness. M.1’s execution-aware training helps it understand the ripple effects of changes.

Debugging. Given an error message, stack trace, and relevant code, M.1 can identify root causes and suggest fixes. Its training on execution feedback means it has seen similar error patterns millions of times.

Code generation from specifications. Turning a description of desired behavior into working code. M.1’s first-pass correctness rate is high because it has been trained to generate code that actually runs.

Test generation. Writing tests that cover edge cases and actually catch bugs. M.1 understands the relationship between code and test outcomes from its RLCEF training.

Code review. Identifying potential bugs, performance issues, and security vulnerabilities. The model’s execution awareness helps it spot issues that static analysis might miss.

How to use Laguna M.1

OpenRouter (free, limited time)

The fastest way to try M.1. Create an OpenRouter account, get an API key, and point any OpenAI-compatible client at it:

import openai

client = openai.OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="your-openrouter-key"
)

response = client.chat.completions.create(
    model="poolside/laguna-m.1",
    messages=[
        {"role": "system", "content": "You are an expert software engineer."},
        {"role": "user", "content": "Write a Python function that implements a thread-safe LRU cache with TTL support."}
    ]
)

print(response.choices[0].message.content)

For a complete walkthrough of API access options, see our Poolside Laguna API guide.

Amazon Bedrock

For enterprise deployments where code must stay within your AWS environment:

import boto3
import json

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

response = bedrock.invoke_model(
    modelId="poolside.laguna-m1-v1",
    body=json.dumps({
        "messages": [
            {"role": "user", "content": "Refactor this Express.js route handler to use proper error handling and input validation."}
        ],
        "max_tokens": 4096
    })
)

result = json.loads(response["body"].read())
print(result["content"])

With coding tools

M.1 works with any tool that supports custom OpenAI-compatible endpoints:

  • Aider: Set the model to openrouter/poolside/laguna-m.1 in your Aider config
  • Continue: Add an OpenRouter provider pointing to M.1
  • OpenCode: Configure the OpenRouter endpoint as a custom model
  • Custom scripts: Use the OpenAI Python SDK with the OpenRouter base URL

Limitations

M.1 is a coding model, not a general-purpose assistant. Key limitations:

  • No general knowledge tasks. Do not ask M.1 to summarize articles, write emails, or answer trivia. It is trained on code.
  • Proprietary weights. You cannot download M.1 and run it locally. You depend on Poolside’s API, OpenRouter, or Bedrock.
  • Limited time free access. The free OpenRouter tier will eventually end. Plan for paid access if you integrate M.1 into your workflow.
  • New model, limited track record. M.1 has not been through years of production use like GPT or Claude. Edge cases and failure modes are still being discovered by the community.
  • MoE inference requirements. At 225B total parameters, self-hosting M.1 (if weights were available) would require significant GPU infrastructure. This is an API-only model for practical purposes.

Performance considerations

M.1’s 23B active parameters mean inference speed is comparable to other models in the 20-25B range. The MoE routing adds minimal overhead. In practice:

  • Latency: Comparable to Codestral or Mistral Medium for similar-length outputs
  • Throughput: Standard for MoE models — the bottleneck is memory bandwidth for loading expert weights, not compute
  • Context handling: M.1 handles long contexts well for code-heavy inputs, though specific context window limits depend on the deployment (OpenRouter, Bedrock, or direct API may have different limits)

FAQ

Is Laguna M.1 free?

Yes, currently. M.1 is free on OpenRouter for a limited time. Poolside has not announced when paid pricing will start or what it will cost. Amazon Bedrock access follows standard AWS pay-per-token pricing. If you want to evaluate M.1, do it now while it is free.

Can I run Laguna M.1 locally?

No. M.1’s weights are not publicly available — it is a proprietary model. Even if they were, 225B total parameters would require multiple high-end GPUs. For local deployment, use Laguna XS.2 instead, which is Apache 2.0 and runs on consumer hardware.

How does M.1 compare to GPT-5 for coding?

M.1 is trained exclusively on code with RLCEF, while GPT-5 is a general-purpose model. For pure coding tasks — generation, debugging, refactoring, testing — M.1’s specialized training should give it an edge in first-pass correctness. GPT-5 is better for tasks that mix coding with natural language reasoning, planning, or explanation. The best approach is to try both on your specific tasks using the free OpenRouter access.

What programming languages does M.1 support?

M.1 is trained on code across all major programming languages. It handles Python, JavaScript, TypeScript, Java, C++, Go, Rust, Ruby, PHP, Swift, Kotlin, and many others. Performance varies by language — languages with more training data (Python, JavaScript, TypeScript) tend to get better results. The RLCEF training is particularly effective for languages with strong test frameworks and fast execution times.

Should I use M.1 or XS.2?

Use M.1 for complex tasks: multi-file refactoring, architectural decisions, debugging subtle issues, generating large code blocks. Use XS.2 for quick completions, simple function generation, and any workflow where you need local or self-hosted deployment. M.1 is more capable but requires API access. XS.2 is less powerful but fully open and runs anywhere. Many developers use both — M.1 for heavy lifting, XS.2 for fast iterations.

Does M.1 support function calling and tool use?

Yes. M.1 supports function calling through the standard OpenAI-compatible chat completions API. You can define tools in your request and M.1 will generate structured function calls. This makes it compatible with agent frameworks and coding tools that rely on tool use for file operations, terminal commands, and other actions.