Poolside Laguna XS.2 Complete Guide β 33B Open-Weight Coding Model (2026)
Laguna XS.2 is Poolside AIβs open-weight coding model. It has 33 billion total parameters with only 3 billion active per forward pass, using a Mixture-of-Experts architecture. It ships under Apache 2.0 β download the weights, run it locally, fine-tune it, deploy it commercially, no restrictions. It is also free on OpenRouter if you prefer API access.
The βXSβ in the name is not marketing modesty. With 3B active parameters, this model runs on a laptop. A MacBook with 16 GB unified memory handles it. An RTX 3060 handles it. Yet it is trained with Poolsideβs RLCEF (Reinforcement Learning from Code Execution Feedback) pipeline, the same approach used for their flagship 225B Laguna M.1. That training methodology is what makes XS.2 punch above its weight class.
Here is everything you need to know: specs, what it is good at, what it is not, how to run it, and how it compares to other small coding models.
Specifications
| Spec | Value |
|---|---|
| Total parameters | 33B |
| Active parameters | 3B |
| Architecture | Mixture-of-Experts (MoE) |
| Training method | RLCEF |
| Training focus | Code only |
| License | Apache 2.0 |
| Weights | Available on HuggingFace |
| OpenRouter | Free |
| Amazon Bedrock | Available |
| Local deployment | Yes (vLLM, llama.cpp, etc.) |
The 33B/3B split means the model file is larger than a typical 3B model (you need to store all 33B parameters on disk and in memory), but inference speed is determined by the 3B active count. Expect generation speeds comparable to other 3B models on the same hardware.
Why 3B active parameters matter
A 3B active parameter model is fast. On an M2 MacBook Pro with 16 GB RAM, expect 40-60 tokens per second. On an RTX 3060 (12 GB VRAM), similar speeds. On an RTX 4090, you are looking at 80+ tokens per second. This is fast enough for real-time code completion β the model responds before you finish reading the previous line.
The MoE architecture means XS.2 has more knowledge than a dense 3B model. The 33B total parameters store knowledge across multiple expert networks. The routing mechanism selects the most relevant experts for each token, so the model can draw on specialized knowledge for different programming languages and patterns without the inference cost of a 33B dense model.
For context, here is how XS.2βs active parameter count compares to other popular coding models:
| Model | Active params | Total params | Architecture | License |
|---|---|---|---|---|
| Laguna XS.2 | 3B | 33B | MoE | Apache 2.0 |
| Qwen 2.5 Coder 3B | 3B | 3B | Dense | Apache 2.0 |
| DeepSeek Coder 1.3B | 1.3B | 1.3B | Dense | MIT |
| Granite 4.1 3B | 3B | 3B | Dense | Apache 2.0 |
| Phi-3 Mini | 3.8B | 3.8B | Dense | MIT |
XS.2 has the same active parameter count as Qwen 2.5 Coder 3B and Granite 4.1 3B, but with 33B total parameters providing a larger knowledge base through the MoE architecture. The RLCEF training is the other differentiator β none of the models above are trained with code execution feedback.
What XS.2 is good at
XS.2 excels at the high-frequency, low-complexity coding tasks that make up most of a developerβs day:
Code completion. Given a partial function, class, or file, XS.2 generates the rest. The RLCEF training means completions are more likely to be syntactically correct and functionally sound.
Function generation. Describe what a function should do, and XS.2 writes it. Works well for utility functions, data transformations, API handlers, and CRUD operations.
Bug fixes. Show XS.2 a piece of code and an error message, and it identifies the fix. Its execution-aware training helps it understand common error patterns.
Code explanation. XS.2 can explain what a piece of code does, though this is not its primary strength β it is a coding model, not a teaching model.
Test writing. Generate unit tests for existing functions. XS.2βs RLCEF training gives it a good sense of what tests actually catch bugs versus tests that just increase coverage numbers.
What XS.2 is not good at
Be realistic about what a 3B active parameter model can do:
Complex multi-file refactoring. XS.2 can modify individual files, but orchestrating changes across a large codebase requires more capacity. Use Laguna M.1 for this.
Architectural decisions. Choosing between microservices and monolith, designing database schemas for complex domains, or planning API structures β these tasks benefit from the larger M.1 model.
Long context reasoning. XS.2 handles shorter contexts well but may struggle with very long files or large amounts of context. For codebase-wide analysis, use a larger model.
Non-coding tasks. XS.2 is trained on code. Do not use it for writing documentation prose, emails, or anything that is not code.
Languages with limited training data. XS.2 performs best on popular languages (Python, JavaScript, TypeScript, Java, Go, Rust). Less common languages may get weaker results.
How to run XS.2 locally
XS.2 runs on consumer hardware. Here are the minimum requirements:
- Mac: Apple Silicon with 16 GB unified memory (M1/M2/M3/M4)
- NVIDIA GPU: 8 GB VRAM minimum (RTX 3060 or better)
- CPU-only: Possible but slow β 16 GB RAM minimum, expect 5-10 tokens/second
For detailed setup instructions including vLLM, llama.cpp, and quantization options, see our how to run Poolside Laguna locally guide.
Quick start with vLLM:
pip install vllm
vllm serve poolside/laguna-xs.2 \
--dtype auto \
--max-model-len 8192
This starts an OpenAI-compatible API server on port 8000. Point any tool that supports custom endpoints at http://localhost:8000/v1.
Using XS.2 through OpenRouter
If you do not want to run locally, OpenRouter provides free API access:
import openai
client = openai.OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="your-openrouter-key"
)
response = client.chat.completions.create(
model="poolside/laguna-xs.2",
messages=[
{"role": "user", "content": "Write a Go function that reads a CSV file and returns a slice of structs with proper error handling."}
]
)
print(response.choices[0].message.content)
Integrating XS.2 with coding tools
XS.2 works with any tool that supports OpenAI-compatible APIs:
Aider: Add XS.2 as a custom model pointing to your local vLLM server or OpenRouter. Good for quick edits and small features.
Continue (VS Code): Configure an OpenRouter or local provider with the XS.2 model ID. Use it for inline completions and chat.
Ollama: If XS.2 becomes available in the Ollama library, it will be the simplest local setup. Check the best Ollama models for coding for current availability.
Custom scripts: Use the OpenAI Python SDK with either your local server or OpenRouter as the base URL.
Fine-tuning XS.2
The Apache 2.0 license means you can fine-tune XS.2 on your own data. This is where XS.2 gets interesting for teams:
- Codebase-specific fine-tuning: Train on your companyβs code to get completions that match your style, patterns, and internal libraries.
- Language specialization: If you work primarily in one language, fine-tuning on a large corpus of that language can improve performance.
- Domain specialization: Fine-tune on domain-specific code (embedded systems, data pipelines, game development) for better results in your niche.
Fine-tuning a 33B MoE model requires more memory than fine-tuning a 3B dense model because all parameters need to be loaded, even though only 3B are active during inference. Plan for at least 40 GB of GPU memory for full fine-tuning, or use LoRA/QLoRA for parameter-efficient fine-tuning on smaller GPUs.
XS.2 vs. other small coding models
vs. Granite 4.1 3B
Granite 4.1 3B is a dense 3B model from IBM, also Apache 2.0. It has 128K context and strong coding performance. XS.2 has the same active parameter count but 33B total through MoE, giving it a larger knowledge base. Granite 4.1 3B may be simpler to deploy (no MoE routing overhead) and has IBMβs enterprise support. For a detailed comparison, see our Granite 4.1 local setup guide.
vs. Qwen 2.5 Coder 3B
Qwen 2.5 Coder 3B is a dense 3B coding model with strong multilingual support. XS.2βs MoE architecture gives it more knowledge capacity, and RLCEF training may improve first-pass correctness. Qwen 2.5 Coder has a larger community and more tooling support.
vs. DeepSeek Coder 1.3B
DeepSeek Coder 1.3B is smaller and faster but less capable. XS.2 is the better choice if your hardware can handle it. DeepSeek Coder 1.3B is better for extremely constrained environments.
vs. Laguna M.1
M.1 is Poolsideβs flagship β 225B total, 23B active. It is significantly more capable for complex tasks but requires API access (no local deployment). Use XS.2 for speed, privacy, and offline work. Use M.1 for complex refactoring, debugging, and architecture tasks.
Quantization options
XS.2 can be quantized to reduce memory usage:
- FP16: Full precision, ~66 GB for all parameters (not practical for most consumer hardware)
- INT8: ~33 GB, fits on high-end GPUs or Macs with 64 GB
- INT4 (GPTQ/AWQ): ~17 GB, fits on RTX 4090 or Mac with 32 GB
- GGUF Q4_K_M: ~19 GB, good balance of quality and size for llama.cpp
The MoE architecture means quantization affects all expert weights, not just the active ones. Quality degradation from aggressive quantization may be more noticeable than with dense models because the routing mechanism depends on precise weight values.
FAQ
Is Laguna XS.2 really free?
Yes. The weights are Apache 2.0 β download them from HuggingFace and use them however you want, including commercially. OpenRouter access is also free with no announced end date. There are no usage caps on the Apache 2.0 weights. OpenRouter may have rate limits on the free tier.
What hardware do I need to run XS.2 locally?
Minimum: 16 GB unified memory (Apple Silicon) or 8 GB VRAM (NVIDIA GPU) with INT4 quantization. Recommended: 32 GB unified memory or 24 GB VRAM for comfortable operation with higher quantization quality. CPU-only is possible with 16+ GB RAM but expect slow inference (5-10 tokens/second). For detailed hardware guidance, see our local setup guide.
Can I fine-tune XS.2 on my companyβs code?
Yes. Apache 2.0 allows unrestricted fine-tuning and commercial deployment. You will need at least 40 GB of GPU memory for full fine-tuning of the 33B parameter model, or you can use LoRA/QLoRA for parameter-efficient fine-tuning on GPUs with 24 GB VRAM. The fine-tuned model remains yours under Apache 2.0.
How does XS.2 compare to GitHub Copilot?
Different tools for different needs. Copilot is an IDE-integrated service using proprietary models (GPT-4 variants). XS.2 is an open-weight model you can run anywhere. Copilot has better IDE integration out of the box. XS.2 gives you full control, privacy, and no subscription cost. You can use XS.2 through tools like Continue or Aider to get a Copilot-like experience with an open model.
Should I use XS.2 or M.1?
For quick completions, simple functions, and local/private deployment: XS.2. For complex multi-file tasks, debugging, and architecture work: M.1. Many developers use both β XS.2 locally for fast iterations, M.1 through the API for heavy lifting. Start with XS.2 and escalate to M.1 when you hit its limits.
Does XS.2 work offline?
Yes. Download the weights, run them locally with vLLM or llama.cpp, and you have a fully offline coding assistant. No internet connection required after the initial download. This makes XS.2 ideal for air-gapped environments, travel, or any situation where you cannot or do not want to send code to an external API.