πŸ€– AI Tools
Β· 5 min read
Last updated on

What is Falcon? TII's Open-Source AI Model from the UAE


Falcon is a family of open-source language models built by the Technology Innovation Institute (TII) in Abu Dhabi, UAE. It was one of the first non-US models to compete with GPT-3.5 on benchmarks and has evolved into a multi-model family covering text, code, vision, and reasoning.

The Falcon model family

ModelParametersTypeBest forLicense
Falcon 2 11B11BTextGeneral purpose, 11 languages, 5T tokens trainedApache 2.0
Falcon H1R 7B7BHybrid (SSM + attention)Reasoning, math, codingApache 2.0
Falcon Perception600MVisionObject detection, segmentationApache 2.0
Falcon OCR300MVisionText extraction from imagesApache 2.0
Falcon 40B40BTextHigh-quality generationApache 2.0
Falcon 180B180BTextFrontier quality (needs GPU cluster)Custom

Falcon H1R 7B: the new star

The latest Falcon release is H1R-7B, a hybrid model combining State Space Models (SSM) with traditional attention. At just 7B parameters, it outperforms models up to 47B parameters. It scored 88.1% on AIME-24 (math), beating Microsoft Phi-4 14B, Alibaba Qwen3 32B, and NVIDIA Nemotron 47B. It processes up to 1,500 tokens per second per GPU.

Why it matters: Most small models (7-9B) are mediocre at reasoning. Falcon H1R-7B proves that architecture innovation (hybrid SSM + attention) can beat raw parameter count. It’s a direct competitor to Qwen3 8B and Yi-Coder 9B.

Falcon vs other open models

ModelParamsReasoningCodingMultilingualLicense
Falcon H1R 7B7Bβœ… 88.1% AIME-24GoodGoodApache 2.0
Falcon 2 11B11BGoodGoodβœ… StrongApache 2.0
Qwen3 8B8BGoodGoodβœ… StrongApache 2.0
Yi-Coder 9B9BDecentβœ… StrongGoodApache 2.0
DeepSeek R1 14B14Bβœ… BestGoodGoodMIT
Gemma 4 9B9BGoodGoodGoodCustom

Falcon H1R-7B’s hybrid architecture gives it a reasoning edge over other 7-9B models. For pure coding, Yi-Coder 9B is still better. For deep reasoning, DeepSeek R1 14B wins but needs more RAM.

The UAE AI ecosystem

Falcon is part of a broader UAE investment in AI sovereignty:

  • Falcon (TII) β€” general-purpose open models
  • Jais (G42/MBZUAI) β€” Arabic-specialized models
  • G42 β€” AI infrastructure and cloud
  • MBZUAI β€” AI research university

Together, the UAE has invested billions in building an independent AI ecosystem. For developers, this means more high-quality open models with permissive licenses.

How to run Falcon locally

# Install Ollama
brew install ollama

# Falcon 2 (11B, general purpose)
ollama pull falcon2

# Falcon 40B (needs 32GB+ RAM)
ollama pull falcon:40b

# Test
ollama run falcon2 "Explain microservices architecture"

Hardware requirements

ModelRAM neededPerformance
Falcon H1R 7B6 GB~25 tok/s on M2
Falcon 2 11B8 GB~20 tok/s on M2
Falcon 40B32 GB~10 tok/s on M3 Pro
Falcon 180B128 GB+Needs GPU cluster

With coding tools

# Aider
aider --model ollama/falcon2

# Continue.dev - add to config.json
# { "models": [{ "provider": "ollama", "model": "falcon2" }] }

Who should use Falcon

  • Multilingual projects β€” Falcon 2 was trained on diverse multilingual data across 11 languages
  • Reasoning tasks on budget hardware β€” Falcon H1R-7B at 7B beats models up to 47B including Microsoft Phi-4 Reasoning Plus 14B, Alibaba Qwen3 32B, and NVIDIA Nemotron H 47B
  • UAE/Middle East deployment β€” local ecosystem, cultural alignment
  • Apache 2.0 needed β€” fully commercial, no restrictions

For coding specifically, Yi-Coder 9B or Qwen3 8B are better choices. Falcon’s strength is reasoning and multilingual capability.

The Falcon H1 architecture explained

What makes Falcon H1R special is its hybrid architecture. Traditional transformers use attention mechanisms that scale quadratically with sequence length β€” doubling the context doubles the compute by 4x. Falcon H1R combines:

Transformer attention layers β€” precise token-level reasoning, good at understanding relationships between specific words/tokens.

Mamba (State Space Model) layers β€” efficient sequential processing with linear scaling. The Mamba component processes sequences in constant memory per token, regardless of length.

The result:

  • 256K context window β€” 32x larger than standard Falcon 2’s 8K
  • 1,500 tokens/second per GPU at batch size 64 β€” nearly 2x the throughput of Qwen3-8B
  • 88.1% on AIME-24 β€” a math benchmark where it beats models with 7x more parameters
  • Linear memory scaling β€” the 200,000th token costs the same to process as the 1st

This hybrid approach is similar to what Qwen 3.6 Plus does at a much larger scale (hybrid linear attention + MoE). Falcon H1R proves the concept works at 7B parameters too.

Falcon’s evolution

VersionYearParametersKey achievement
Falcon 7B/40B20237B/40BFirst UAE open model, topped HuggingFace leaderboard
Falcon 180B2023180BLargest open model at the time
Falcon 2 11B202411B5T tokens, 11 languages, VLM variant
Falcon Mamba 7B20247BFirst pure Mamba model from TII
Falcon H1R 7B20267BHybrid architecture, beats 47B models
Falcon Perception2026600MVision model for object detection
Falcon OCR2026300MText extraction from images

TII has consistently pushed boundaries: first to release a 180B open model, first to release a production Mamba model, and now first to demonstrate hybrid SSM+attention beating models 7x larger.

FAQ

Which Falcon model should I use for coding?

For coding tasks, Falcon H1R-7B is the best choice in the Falcon family due to its strong reasoning capabilities. However, for pure coding quality, Yi-Coder 9B or Qwen3 8B are better specialized alternatives. Falcon’s real strength is reasoning and multilingual tasks rather than code generation specifically.

How does Falcon H1R-7B beat models 7x its size?

The hybrid SSM + attention architecture is the key. Traditional transformers scale quadratically with sequence length, but Falcon H1R combines Mamba (State Space Model) layers for efficient sequential processing with transformer attention layers for precise reasoning. This architectural innovation lets it achieve 88.1% on AIME-24 math benchmarks, beating models up to 47B parameters.

Both are UAE-funded AI models but from different organizations. Falcon is built by the Technology Innovation Institute (TII) in Abu Dhabi and focuses on general-purpose multilingual AI. Jais is built by G42/MBZUAI and specializes in Arabic language. They’re complementary parts of the UAE’s broader AI sovereignty strategy.

Related: What is Jais? Β· Falcon vs Jais Β· What is Yi? Β· Best Ollama Models for Coding Β· Best Open Source Coding Models Β· Ollama Complete Guide Β· How to Run Falcon Locally