Microsoft just released Fara-7B — a 7 billion parameter AI model that can autonomously use a computer. It sees screenshots, decides what to click, types text, navigates websites, and completes multi-step tasks like booking flights, shopping, or filling out forms. And it’s fully open-source under the MIT license.
What Fara-7B does
Fara-7B is a Computer Use Agent (CUA). Unlike chatbots that generate text, Fara-7B generates actions. You give it a goal (“book a flight from NYC to London for next Tuesday”), and it:
- Takes a screenshot of the current browser state
- Reasons about what to do next (chain-of-thought)
- Outputs an action: click at coordinates (x, y), type text, scroll, navigate to URL
- Repeats until the task is complete or it needs human confirmation
This is the same concept as Anthropic’s Computer Use and OpenAI’s Operator — but in a model small enough to run on your laptop.
Key specs
| Fara-7B | |
|---|---|
| Parameters | 7 billion |
| Base model | Qwen 2.5-VL-7B (multimodal) |
| Input | Screenshots + text history |
| Output | Thought + action (click, type, scroll, navigate) |
| Context length | 128K tokens |
| License | MIT (fully open, commercial use allowed) |
| Training | 64 H100s, 2.5 days |
| Developer | Microsoft Research |
How it compares to larger models
Fara-7B punches well above its weight class:
| Model | Params | WebVoyager | Online-Mind2Web | DeepShop |
|---|---|---|---|---|
| SoM Agent (GPT-5) | — | 90.6% | 57.7% | 49.1% |
| SoM Agent (o3) | — | 79.3% | 55.4% | 49.7% |
| SoM Agent (GPT-4o) | — | 65.1% | 34.6% | 16.7% |
| Fara-7B | 7B | 73.5% | 34.1% | 26.2% |
| UI-TARS-1.5-7B | 7B | 66.4% | 31.3% | 11.6% |
Fara-7B beats GPT-4o on WebVoyager (73.5% vs 65.1%) despite being a fraction of the size. It’s competitive with o3 on some benchmarks while running on a single GPU.
What makes it different
1. Runs locally and privately Unlike cloud-based agents (Operator, Computer Use), Fara-7B runs entirely on your hardware. Your browsing data, passwords, and actions never leave your machine.
2. MIT license — no restrictions Use it commercially, modify it, embed it in products, fine-tune it on your data. No usage caps, no API costs, no terms of service changes.
3. Screenshot-native Fara-7B doesn’t need DOM access or accessibility trees. It works from raw screenshots — meaning it can operate on any application, not just web browsers. Desktop apps, mobile emulators, anything with a screen.
4. Built-in safety: critical points The model is trained to stop and ask for human confirmation before:
- Entering personal information
- Completing purchases
- Sending emails
- Submitting applications
- Signing into accounts
This “human-in-the-loop” design prevents runaway automation.
What it can do
Primary use cases from Microsoft’s documentation:
- Shopping: Find products, compare prices, add to cart
- Travel booking: Search flights, select options, fill passenger details
- Restaurant reservations: Find restaurants, check availability, book tables
- Account workflows: Update settings, manage subscriptions, fill forms
- Information seeking: Research topics across multiple websites
What it can’t do (yet)
- Complex reasoning tasks — It’s optimized for web navigation, not general intelligence
- Non-English websites — English only; other languages have degraded performance
- High-stakes decisions — Not recommended for medical, legal, or financial actions
- Long multi-session tasks — Works best for tasks completable in one session
The FaraGen data pipeline
What makes Fara-7B special isn’t just the model — it’s how the training data was created. Microsoft built FaraGen, a synthetic data generation system that:
- Proposes diverse tasks from frequently-used websites
- Generates multiple solution attempts using larger models
- Verifies successful trajectories using multiple verifier agents
- Produces verified training trajectories at ~$1 each
This approach means Microsoft can scale training data without expensive human annotation — and the model improves as FaraGen generates more diverse tasks.
How to get started
# Clone the repository
git clone https://github.com/microsoft/fara.git
cd fara
# Setup environment
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
playwright install
# Host the model (requires ~16GB VRAM)
vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto
# Run a task
fara-cli --task "what's the weather in new york now"
For a complete local setup guide, see How to Run Fara-7B Locally.
Why this matters
Computer Use Agents are the next frontier of AI automation. Instead of building custom integrations for every service, a CUA can use any website the same way a human does. Fara-7B makes this accessible to everyone — not just companies paying for GPT-5 or Claude API access.
For developers, this opens up:
- Browser automation without brittle selectors or Puppeteer scripts
- Testing — let the agent explore your app and find issues
- Personal assistants that actually do things, not just suggest them
- Accessibility tools — automate repetitive web tasks for users with disabilities
The fact that it’s MIT-licensed and runs on consumer hardware means we’ll see an explosion of applications built on top of it.
FAQ
Can Fara-7B replace Selenium/Puppeteer for web scraping?
For structured scraping, traditional tools are still faster and more reliable. Fara-7B shines for tasks that require understanding context, making decisions, and handling dynamic UIs — things that break brittle CSS selectors.
How much VRAM does it need?
About 16GB for bf16 inference (fits on an RTX 4090 or A6000). With quantization (Q4), it can run on 8GB VRAM. See our local setup guide for quantized options.
Is it safe to let it browse the internet unsupervised?
Microsoft recommends sandboxed environments, human-in-the-loop for sensitive actions, and URL allow-lists. The model has built-in critical points where it stops for confirmation, but you should still run it in a controlled environment.
How does it compare to Anthropic Computer Use?
See our detailed comparison — the short version: Fara-7B is free, local, and open-source but less capable than Claude’s Computer Use on complex tasks.
Related articles
- How to Run Fara-7B Locally
- Fara-7B vs Anthropic Computer Use vs OpenAI Operator
- Claude Computer Use Guide
- Best AI Coding Agents 2026
- How to Run AI Models Locally