What is InclusionAI? Ling Models and the Trillion-Parameter Coding Series (2026)
InclusionAI is a Chinese AI research organization building open-source Mixture-of-Experts (MoE) models with a sharp focus on coding, agentic workflows, and inference efficiency. Their flagship model, Ling 2.6, packs 1 trillion total parameters. Their lightweight variant, Ling 2.6 Flash, runs with just 7.4 billion active parameters — small enough for consumer hardware. Everything is open-source on HuggingFace and GitHub.
While most AI labs chase general-purpose benchmarks, InclusionAI has carved out a specific niche: building models that are optimized for writing code, executing multi-step agentic tasks, and doing it all with minimal token overhead. The Ling model family spans from the tiny Ling-Lite (2.75B active) to the massive Ling 2.6 (1T total), giving developers options at every scale.
Here is everything you need to know about InclusionAI, the Ling model family, and where it fits in the open-source AI landscape.
Who is InclusionAI?
InclusionAI is a Chinese AI research organization that has been building large language models with a focus on practical developer tooling. Unlike consumer-facing AI companies that prioritize chatbot experiences, InclusionAI targets the developer and engineering audience directly. Their models are designed to be deployed in coding pipelines, integrated into agentic systems, and run efficiently on a range of hardware.
The organization publishes all model weights on HuggingFace and maintains their codebase on GitHub at inclusionAI/Ling. This full open-source approach — weights, code, and training framework — sets them apart from labs that release weights but keep training infrastructure proprietary.
InclusionAI also developed AReaL, their reinforcement learning framework specifically designed for improving LLM reasoning capabilities. AReaL is used to train the reasoning-focused variants in the Ling family, particularly Ring 1T, their thinking model.
The Ling model family
The Ling series uses Mixture-of-Experts architecture across the board. MoE models have a large total parameter count but only activate a fraction of those parameters for each token, keeping inference costs manageable while maintaining the knowledge capacity of a much larger model.
Here is the complete lineup:
Ling-Lite
- Total parameters: 16.8B
- Active parameters: 2.75B
- Target: Edge devices, lightweight local inference
- License: Open source (HuggingFace)
Ling-Lite is the smallest model in the family. With only 2.75B active parameters, it runs on almost anything — laptops, Raspberry Pi-class devices, phones. It is not going to match frontier models on complex reasoning, but for code completion, simple generation tasks, and basic agentic operations, it is remarkably capable for its size.
Ling-Plus
- Total parameters: 290B
- Active parameters: 28.8B
- Target: Server-grade inference, production workloads
- License: Open source (HuggingFace)
Ling-Plus sits in the middle of the family. At 28.8B active parameters, it needs a decent GPU setup — think A100 or multiple consumer GPUs — but delivers strong performance across coding benchmarks. This is the model you would deploy for a team-wide coding assistant or a production API endpoint.
Ling 2.6
- Total parameters: 1T (1 trillion)
- Active parameters: Not publicly specified at the Lite/Plus ratio
- Target: Frontier coding performance, research
- Architecture: MoE, coding-optimized
- License: Open source (HuggingFace)
Ling 2.6 is the flagship. One trillion total parameters makes it one of the largest open-source models available. It is specifically optimized for coding tasks and agentic workflows, with targeted improvements in token efficiency and multi-step reasoning. Running it requires serious infrastructure — multi-GPU clusters or cloud GPU providers — but the performance justifies the cost for demanding workloads.
Ling 2.6 Flash
- Total parameters: 104B
- Active parameters: 7.4B
- Target: Local inference, consumer hardware
- Architecture: MoE, coding-optimized
- License: Open source (HuggingFace)
Flash is the local-friendly variant of Ling 2.6. It inherits the coding optimizations and architectural improvements of the full model but compresses them into a package that runs with just 7.4B active parameters. A Mac with 16 GB unified memory or a GPU with 12+ GB VRAM can handle it. This is the model most individual developers will actually use day-to-day.
Ring 1T
- Total parameters: 1T
- Target: Complex reasoning, thinking tasks
- Architecture: MoE with extended reasoning chains
- License: Open source (HuggingFace)
Ring 1T is the thinking/reasoning variant. Built on the same 1T parameter base as Ling 2.6, it is trained with AReaL (InclusionAI’s reinforcement learning framework) to handle multi-step reasoning, mathematical proofs, complex debugging, and tasks that require extended chains of thought. Think of it as InclusionAI’s answer to models like DeepSeek R1 or QwQ.
Why MoE architecture matters for coding
Mixture-of-Experts is not just a parameter-count trick. For coding tasks specifically, MoE offers real advantages.
Code is diverse. A Python web framework, a Rust systems library, and a SQL query optimizer require fundamentally different knowledge. In a dense model, every parameter activates for every token, meaning the model’s capacity is spread thin across all domains. In an MoE model, different expert networks specialize in different domains. When you are writing Python, the Python-specialized experts activate. When you switch to SQL, different experts take over.
This specialization is why Ling Flash can deliver strong coding performance with only 7.4B active parameters. The 104B total parameter count means the model has deep knowledge across many programming languages and frameworks, but the sparse routing ensures you only pay the inference cost of a 7B model.
For agentic workflows — where a model needs to plan, execute, evaluate, and iterate — MoE is particularly effective. Different steps in an agentic pipeline may require different types of expertise (planning vs. code generation vs. error analysis), and MoE models can route to the appropriate experts for each step.
AReaL: the reinforcement learning framework
AReaL (Adaptive Reinforcement Learning for LLMs) is InclusionAI’s in-house framework for training reasoning capabilities into their models. It is the technology behind Ring 1T and the reasoning improvements in the broader Ling family.
The framework uses a combination of process reward models and outcome-based verification to train models on multi-step reasoning tasks. Rather than just rewarding correct final answers, AReaL rewards correct intermediate steps, which produces models that show their work and reason more reliably.
For coding specifically, AReaL enables models to break down complex programming tasks into logical steps: understanding the requirements, planning the architecture, writing the code, testing it mentally, and refining the output. This structured approach to code generation produces more reliable results than models that try to generate entire solutions in a single pass.
AReaL is also open-source, published alongside the Ling models on GitHub. This means researchers and developers can use the same framework to fine-tune Ling models or train their own reasoning-enhanced models.
Inference efficiency optimizations
InclusionAI has invested heavily in making their models efficient to run, not just accurate. Several specific optimizations stand out:
Token overhead reduction. Ling models are trained to produce concise outputs. In coding tasks, this means less boilerplate, fewer unnecessary comments, and more direct code generation. The practical impact is lower token costs when using API-based access and faster response times for local inference.
Agentic capability optimization. The models are specifically tuned for multi-turn agentic interactions — tool calling, function execution, iterative refinement. This is not an afterthought bolted onto a chat model. The agentic capabilities are baked into the training process.
MoE routing efficiency. The expert routing in Ling models is optimized to minimize the overhead of the routing decision itself. In some MoE implementations, the router adds significant latency. InclusionAI’s implementation keeps routing overhead minimal, which matters especially for the smaller models where routing cost is a larger fraction of total inference time.
How InclusionAI compares to other Chinese AI labs
The Chinese AI ecosystem has produced several strong open-source model families. Here is where InclusionAI fits:
vs. DeepSeek. DeepSeek V3/V4 are the most well-known Chinese open-source models. DeepSeek uses MoE architecture too, but targets general-purpose performance. InclusionAI’s coding-specific optimization gives Ling an edge on programming benchmarks, while DeepSeek tends to be stronger on general knowledge and reasoning breadth.
vs. Qwen (Alibaba). Qwen 3.x is a massive model family covering everything from tiny to frontier scale. Qwen is more general-purpose, with strong multilingual capabilities. InclusionAI is more narrowly focused on coding and agentic tasks, which means better performance in that specific domain but less versatility.
vs. Kimi (Moonshot AI). Kimi K2 focuses on long-context and agentic capabilities. There is overlap with InclusionAI’s agentic focus, but Kimi’s strength is in context length (up to 1M tokens) while InclusionAI’s strength is in coding-specific optimization and inference efficiency.
vs. GLM (Zhipu AI). GLM 5.1 targets agentic engineering with strong tool-calling capabilities. InclusionAI and Zhipu share the agentic focus, but InclusionAI’s MoE architecture gives it better inference efficiency at scale.
For developers specifically interested in coding models, InclusionAI’s Ling family is one of the strongest options in the Chinese open-source ecosystem. The combination of coding optimization, MoE efficiency, and full open-source availability makes it a compelling choice.
Where to get InclusionAI Ling models
All Ling models are available through multiple channels:
- HuggingFace: Full model weights for all variants (Lite, Plus, 2.6, Flash, Ring 1T)
- GitHub: Source code, training framework (AReaL), and documentation at inclusionAI/Ling
- Local inference: Compatible with vLLM, llama.cpp (GGUF conversions), and other standard inference frameworks
- API access: Available through various API providers and self-hosted endpoints
The open-source licensing means you can deploy these models anywhere — on-premises, in your own cloud infrastructure, or on your local machine. There are no usage restrictions or API key requirements for local deployment.
If you are looking for a coding-optimized model to run locally, Ling Flash is the obvious starting point. If you need more power and have the infrastructure, Ling-Plus or the full Ling 2.6 deliver frontier-level coding performance. Check our guide on the best Ollama models for coding in 2026 to see how Ling Flash compares to other local options.
For context on how InclusionAI fits into the broader trend of countries building their own AI capabilities, see our overview of sovereign AI models in 2026. And if you are trying to decide which AI coding tool to use, our guide on how to choose an AI coding agent in 2026 covers the full landscape.
FAQ
Is InclusionAI Ling free to use?
Yes. All Ling models are open-source and available on HuggingFace with no usage restrictions. You can download the weights and run them locally without any API keys or subscriptions. For API access through third-party providers, standard provider pricing applies.
What is the difference between Ling 2.6 and Ling 2.6 Flash?
Ling 2.6 is the full trillion-parameter flagship model designed for maximum coding performance. Flash is the lightweight variant: 104B total parameters with only 7.4B active, designed to run on consumer hardware. Flash inherits the coding optimizations of the full model but trades some capability for dramatically lower hardware requirements.
Can I run InclusionAI Ling models locally?
Yes. Ling-Lite (2.75B active) runs on almost any hardware. Ling Flash (7.4B active) runs on a Mac with 16 GB RAM or a GPU with 12+ GB VRAM. Ling-Plus (28.8B active) needs server-grade GPUs. The full Ling 2.6 (1T total) requires multi-GPU clusters. All models are compatible with standard inference frameworks like vLLM and llama.cpp.
What is Ring 1T?
Ring 1T is InclusionAI’s reasoning/thinking model variant. It is built on the same 1T parameter MoE architecture as Ling 2.6 but trained with AReaL (InclusionAI’s reinforcement learning framework) to handle complex multi-step reasoning, mathematical proofs, and extended chains of thought. It is the equivalent of models like DeepSeek R1 in the InclusionAI ecosystem.
What is AReaL?
AReaL (Adaptive Reinforcement Learning for LLMs) is InclusionAI’s open-source framework for training reasoning capabilities into language models. It uses process reward models and outcome-based verification to improve multi-step reasoning. AReaL is used to train Ring 1T and improve reasoning across the Ling family. The framework is available on GitHub alongside the model code.
How does InclusionAI compare to DeepSeek?
Both use MoE architecture and are open-source. DeepSeek targets general-purpose performance across all tasks. InclusionAI focuses specifically on coding and agentic workflows. For programming tasks, Ling models tend to be more efficient due to their coding-specific optimizations. For general knowledge, reasoning breadth, and non-coding tasks, DeepSeek is typically stronger.
What programming languages does Ling support?
Ling models are trained on a broad range of programming languages including Python, JavaScript, TypeScript, Java, C++, Rust, Go, SQL, and many others. The coding-specific training means strong performance across mainstream languages, with particular strength in Python and JavaScript/TypeScript due to training data distribution.