🤖 AI Tools
· 5 min read

What is Microsoft Fara-7B? The First Open-Source Computer Use Agent


Microsoft just released Fara-7B — a 7 billion parameter AI model that can autonomously use a computer. It sees screenshots, decides what to click, types text, navigates websites, and completes multi-step tasks like booking flights, shopping, or filling out forms. And it’s fully open-source under the MIT license.

What Fara-7B does

Fara-7B is a Computer Use Agent (CUA). Unlike chatbots that generate text, Fara-7B generates actions. You give it a goal (“book a flight from NYC to London for next Tuesday”), and it:

  1. Takes a screenshot of the current browser state
  2. Reasons about what to do next (chain-of-thought)
  3. Outputs an action: click at coordinates (x, y), type text, scroll, navigate to URL
  4. Repeats until the task is complete or it needs human confirmation

This is the same concept as Anthropic’s Computer Use and OpenAI’s Operator — but in a model small enough to run on your laptop.

Key specs

Fara-7B
Parameters7 billion
Base modelQwen 2.5-VL-7B (multimodal)
InputScreenshots + text history
OutputThought + action (click, type, scroll, navigate)
Context length128K tokens
LicenseMIT (fully open, commercial use allowed)
Training64 H100s, 2.5 days
DeveloperMicrosoft Research

How it compares to larger models

Fara-7B punches well above its weight class:

ModelParamsWebVoyagerOnline-Mind2WebDeepShop
SoM Agent (GPT-5)90.6%57.7%49.1%
SoM Agent (o3)79.3%55.4%49.7%
SoM Agent (GPT-4o)65.1%34.6%16.7%
Fara-7B7B73.5%34.1%26.2%
UI-TARS-1.5-7B7B66.4%31.3%11.6%

Fara-7B beats GPT-4o on WebVoyager (73.5% vs 65.1%) despite being a fraction of the size. It’s competitive with o3 on some benchmarks while running on a single GPU.

What makes it different

1. Runs locally and privately Unlike cloud-based agents (Operator, Computer Use), Fara-7B runs entirely on your hardware. Your browsing data, passwords, and actions never leave your machine.

2. MIT license — no restrictions Use it commercially, modify it, embed it in products, fine-tune it on your data. No usage caps, no API costs, no terms of service changes.

3. Screenshot-native Fara-7B doesn’t need DOM access or accessibility trees. It works from raw screenshots — meaning it can operate on any application, not just web browsers. Desktop apps, mobile emulators, anything with a screen.

4. Built-in safety: critical points The model is trained to stop and ask for human confirmation before:

  • Entering personal information
  • Completing purchases
  • Sending emails
  • Submitting applications
  • Signing into accounts

This “human-in-the-loop” design prevents runaway automation.

What it can do

Primary use cases from Microsoft’s documentation:

  • Shopping: Find products, compare prices, add to cart
  • Travel booking: Search flights, select options, fill passenger details
  • Restaurant reservations: Find restaurants, check availability, book tables
  • Account workflows: Update settings, manage subscriptions, fill forms
  • Information seeking: Research topics across multiple websites

What it can’t do (yet)

  • Complex reasoning tasks — It’s optimized for web navigation, not general intelligence
  • Non-English websites — English only; other languages have degraded performance
  • High-stakes decisions — Not recommended for medical, legal, or financial actions
  • Long multi-session tasks — Works best for tasks completable in one session

The FaraGen data pipeline

What makes Fara-7B special isn’t just the model — it’s how the training data was created. Microsoft built FaraGen, a synthetic data generation system that:

  1. Proposes diverse tasks from frequently-used websites
  2. Generates multiple solution attempts using larger models
  3. Verifies successful trajectories using multiple verifier agents
  4. Produces verified training trajectories at ~$1 each

This approach means Microsoft can scale training data without expensive human annotation — and the model improves as FaraGen generates more diverse tasks.

How to get started

# Clone the repository
git clone https://github.com/microsoft/fara.git
cd fara

# Setup environment
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
playwright install

# Host the model (requires ~16GB VRAM)
vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto

# Run a task
fara-cli --task "what's the weather in new york now"

For a complete local setup guide, see How to Run Fara-7B Locally.

Why this matters

Computer Use Agents are the next frontier of AI automation. Instead of building custom integrations for every service, a CUA can use any website the same way a human does. Fara-7B makes this accessible to everyone — not just companies paying for GPT-5 or Claude API access.

For developers, this opens up:

  • Browser automation without brittle selectors or Puppeteer scripts
  • Testing — let the agent explore your app and find issues
  • Personal assistants that actually do things, not just suggest them
  • Accessibility tools — automate repetitive web tasks for users with disabilities

The fact that it’s MIT-licensed and runs on consumer hardware means we’ll see an explosion of applications built on top of it.

FAQ

Can Fara-7B replace Selenium/Puppeteer for web scraping?

For structured scraping, traditional tools are still faster and more reliable. Fara-7B shines for tasks that require understanding context, making decisions, and handling dynamic UIs — things that break brittle CSS selectors.

How much VRAM does it need?

About 16GB for bf16 inference (fits on an RTX 4090 or A6000). With quantization (Q4), it can run on 8GB VRAM. See our local setup guide for quantized options.

Is it safe to let it browse the internet unsupervised?

Microsoft recommends sandboxed environments, human-in-the-loop for sensitive actions, and URL allow-lists. The model has built-in critical points where it stops for confirmation, but you should still run it in a controlled environment.

How does it compare to Anthropic Computer Use?

See our detailed comparison — the short version: Fara-7B is free, local, and open-source but less capable than Claude’s Computer Use on complex tasks.