Falcon is a family of open-source language models built by the Technology Innovation Institute (TII) in Abu Dhabi, UAE. It was one of the first non-US models to compete with GPT-3.5 on benchmarks and has evolved into a multi-model family covering text, code, vision, and reasoning.
The Falcon model family
| Model | Parameters | Type | Best for | License |
|---|---|---|---|---|
| Falcon 2 11B | 11B | Text | General purpose, 11 languages, 5T tokens trained | Apache 2.0 |
| Falcon H1R 7B | 7B | Hybrid (SSM + attention) | Reasoning, math, coding | Apache 2.0 |
| Falcon Perception | 600M | Vision | Object detection, segmentation | Apache 2.0 |
| Falcon OCR | 300M | Vision | Text extraction from images | Apache 2.0 |
| Falcon 40B | 40B | Text | High-quality generation | Apache 2.0 |
| Falcon 180B | 180B | Text | Frontier quality (needs GPU cluster) | Custom |
Falcon H1R 7B: the new star
The latest Falcon release is H1R-7B, a hybrid model combining State Space Models (SSM) with traditional attention. At just 7B parameters, it outperforms models up to 47B parameters. It scored 88.1% on AIME-24 (math), beating Microsoft Phi-4 14B, Alibaba Qwen3 32B, and NVIDIA Nemotron 47B. It processes up to 1,500 tokens per second per GPU.
Why it matters: Most small models (7-9B) are mediocre at reasoning. Falcon H1R-7B proves that architecture innovation (hybrid SSM + attention) can beat raw parameter count. Itβs a direct competitor to Qwen3 8B and Yi-Coder 9B.
Falcon vs other open models
| Model | Params | Reasoning | Coding | Multilingual | License |
|---|---|---|---|---|---|
| Falcon H1R 7B | 7B | β 88.1% AIME-24 | Good | Good | Apache 2.0 |
| Falcon 2 11B | 11B | Good | Good | β Strong | Apache 2.0 |
| Qwen3 8B | 8B | Good | Good | β Strong | Apache 2.0 |
| Yi-Coder 9B | 9B | Decent | β Strong | Good | Apache 2.0 |
| DeepSeek R1 14B | 14B | β Best | Good | Good | MIT |
| Gemma 4 9B | 9B | Good | Good | Good | Custom |
Falcon H1R-7Bβs hybrid architecture gives it a reasoning edge over other 7-9B models. For pure coding, Yi-Coder 9B is still better. For deep reasoning, DeepSeek R1 14B wins but needs more RAM.
The UAE AI ecosystem
Falcon is part of a broader UAE investment in AI sovereignty:
- Falcon (TII) β general-purpose open models
- Jais (G42/MBZUAI) β Arabic-specialized models
- G42 β AI infrastructure and cloud
- MBZUAI β AI research university
Together, the UAE has invested billions in building an independent AI ecosystem. For developers, this means more high-quality open models with permissive licenses.
How to run Falcon locally
# Install Ollama
brew install ollama
# Falcon 2 (11B, general purpose)
ollama pull falcon2
# Falcon 40B (needs 32GB+ RAM)
ollama pull falcon:40b
# Test
ollama run falcon2 "Explain microservices architecture"
Hardware requirements
| Model | RAM needed | Performance |
|---|---|---|
| Falcon H1R 7B | 6 GB | ~25 tok/s on M2 |
| Falcon 2 11B | 8 GB | ~20 tok/s on M2 |
| Falcon 40B | 32 GB | ~10 tok/s on M3 Pro |
| Falcon 180B | 128 GB+ | Needs GPU cluster |
With coding tools
# Aider
aider --model ollama/falcon2
# Continue.dev - add to config.json
# { "models": [{ "provider": "ollama", "model": "falcon2" }] }
Who should use Falcon
- Multilingual projects β Falcon 2 was trained on diverse multilingual data across 11 languages
- Reasoning tasks on budget hardware β Falcon H1R-7B at 7B beats models up to 47B including Microsoft Phi-4 Reasoning Plus 14B, Alibaba Qwen3 32B, and NVIDIA Nemotron H 47B
- UAE/Middle East deployment β local ecosystem, cultural alignment
- Apache 2.0 needed β fully commercial, no restrictions
For coding specifically, Yi-Coder 9B or Qwen3 8B are better choices. Falconβs strength is reasoning and multilingual capability.
The Falcon H1 architecture explained
What makes Falcon H1R special is its hybrid architecture. Traditional transformers use attention mechanisms that scale quadratically with sequence length β doubling the context doubles the compute by 4x. Falcon H1R combines:
Transformer attention layers β precise token-level reasoning, good at understanding relationships between specific words/tokens.
Mamba (State Space Model) layers β efficient sequential processing with linear scaling. The Mamba component processes sequences in constant memory per token, regardless of length.
The result:
- 256K context window β 32x larger than standard Falcon 2βs 8K
- 1,500 tokens/second per GPU at batch size 64 β nearly 2x the throughput of Qwen3-8B
- 88.1% on AIME-24 β a math benchmark where it beats models with 7x more parameters
- Linear memory scaling β the 200,000th token costs the same to process as the 1st
This hybrid approach is similar to what Qwen 3.6 Plus does at a much larger scale (hybrid linear attention + MoE). Falcon H1R proves the concept works at 7B parameters too.
Falconβs evolution
| Version | Year | Parameters | Key achievement |
|---|---|---|---|
| Falcon 7B/40B | 2023 | 7B/40B | First UAE open model, topped HuggingFace leaderboard |
| Falcon 180B | 2023 | 180B | Largest open model at the time |
| Falcon 2 11B | 2024 | 11B | 5T tokens, 11 languages, VLM variant |
| Falcon Mamba 7B | 2024 | 7B | First pure Mamba model from TII |
| Falcon H1R 7B | 2026 | 7B | Hybrid architecture, beats 47B models |
| Falcon Perception | 2026 | 600M | Vision model for object detection |
| Falcon OCR | 2026 | 300M | Text extraction from images |
TII has consistently pushed boundaries: first to release a 180B open model, first to release a production Mamba model, and now first to demonstrate hybrid SSM+attention beating models 7x larger.
FAQ
Which Falcon model should I use for coding?
For coding tasks, Falcon H1R-7B is the best choice in the Falcon family due to its strong reasoning capabilities. However, for pure coding quality, Yi-Coder 9B or Qwen3 8B are better specialized alternatives. Falconβs real strength is reasoning and multilingual tasks rather than code generation specifically.
How does Falcon H1R-7B beat models 7x its size?
The hybrid SSM + attention architecture is the key. Traditional transformers scale quadratically with sequence length, but Falcon H1R combines Mamba (State Space Model) layers for efficient sequential processing with transformer attention layers for precise reasoning. This architectural innovation lets it achieve 88.1% on AIME-24 math benchmarks, beating models up to 47B parameters.
Is Falcon related to Jais?
Both are UAE-funded AI models but from different organizations. Falcon is built by the Technology Innovation Institute (TII) in Abu Dhabi and focuses on general-purpose multilingual AI. Jais is built by G42/MBZUAI and specializes in Arabic language. Theyβre complementary parts of the UAEβs broader AI sovereignty strategy.
Related: What is Jais? Β· Falcon vs Jais Β· What is Yi? Β· Best Ollama Models for Coding Β· Best Open Source Coding Models Β· Ollama Complete Guide Β· How to Run Falcon Locally