May 19, 2026 · 5 min read

What is Microsoft Fara-7B? The First Open-Source Computer Use Agent

Microsoft just released Fara-7B — a 7 billion parameter AI model that can autonomously use a computer. It sees screenshots, decides what to click, types text, navigates websites, and completes multi-step tasks like booking flights, shopping, or filling out forms. And it’s fully open-source under the MIT license.

What Fara-7B does

Fara-7B is a Computer Use Agent (CUA). Unlike chatbots that generate text, Fara-7B generates actions. You give it a goal (“book a flight from NYC to London for next Tuesday”), and it:

Takes a screenshot of the current browser state
Reasons about what to do next (chain-of-thought)
Outputs an action: click at coordinates (x, y), type text, scroll, navigate to URL
Repeats until the task is complete or it needs human confirmation

This is the same concept as Anthropic’s Computer Use and OpenAI’s Operator — but in a model small enough to run on your laptop.

Key specs

	Fara-7B
Parameters	7 billion
Base model	Qwen 2.5-VL-7B (multimodal)
Input	Screenshots + text history
Output	Thought + action (click, type, scroll, navigate)
Context length	128K tokens
License	MIT (fully open, commercial use allowed)
Training	64 H100s, 2.5 days
Developer	Microsoft Research

How it compares to larger models

Fara-7B punches well above its weight class:

Model	Params	WebVoyager	Online-Mind2Web	DeepShop
SoM Agent (GPT-5)	—	90.6%	57.7%	49.1%
SoM Agent (o3)	—	79.3%	55.4%	49.7%
SoM Agent (GPT-4o)	—	65.1%	34.6%	16.7%
Fara-7B	7B	73.5%	34.1%	26.2%
UI-TARS-1.5-7B	7B	66.4%	31.3%	11.6%

Fara-7B beats GPT-4o on WebVoyager (73.5% vs 65.1%) despite being a fraction of the size. It’s competitive with o3 on some benchmarks while running on a single GPU.

What makes it different

1. Runs locally and privately Unlike cloud-based agents (Operator, Computer Use), Fara-7B runs entirely on your hardware. Your browsing data, passwords, and actions never leave your machine.

2. MIT license — no restrictions Use it commercially, modify it, embed it in products, fine-tune it on your data. No usage caps, no API costs, no terms of service changes.

3. Screenshot-native Fara-7B doesn’t need DOM access or accessibility trees. It works from raw screenshots — meaning it can operate on any application, not just web browsers. Desktop apps, mobile emulators, anything with a screen.

4. Built-in safety: critical points The model is trained to stop and ask for human confirmation before:

Entering personal information
Completing purchases
Sending emails
Submitting applications
Signing into accounts

This “human-in-the-loop” design prevents runaway automation.

What it can do

Primary use cases from Microsoft’s documentation:

Shopping: Find products, compare prices, add to cart
Travel booking: Search flights, select options, fill passenger details
Restaurant reservations: Find restaurants, check availability, book tables
Account workflows: Update settings, manage subscriptions, fill forms
Information seeking: Research topics across multiple websites

What it can’t do (yet)

Complex reasoning tasks — It’s optimized for web navigation, not general intelligence
Non-English websites — English only; other languages have degraded performance
High-stakes decisions — Not recommended for medical, legal, or financial actions
Long multi-session tasks — Works best for tasks completable in one session

The FaraGen data pipeline

What makes Fara-7B special isn’t just the model — it’s how the training data was created. Microsoft built FaraGen, a synthetic data generation system that:

Proposes diverse tasks from frequently-used websites
Generates multiple solution attempts using larger models
Verifies successful trajectories using multiple verifier agents
Produces verified training trajectories at ~$1 each

This approach means Microsoft can scale training data without expensive human annotation — and the model improves as FaraGen generates more diverse tasks.

How to get started

# Clone the repository
git clone https://github.com/microsoft/fara.git
cd fara

# Setup environment
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
playwright install

# Host the model (requires ~16GB VRAM)
vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto

# Run a task
fara-cli --task "what's the weather in new york now"

For a complete local setup guide, see How to Run Fara-7B Locally.

Why this matters

Computer Use Agents are the next frontier of AI automation. Instead of building custom integrations for every service, a CUA can use any website the same way a human does. Fara-7B makes this accessible to everyone — not just companies paying for GPT-5 or Claude API access.

For developers, this opens up:

Browser automation without brittle selectors or Puppeteer scripts
Testing — let the agent explore your app and find issues
Personal assistants that actually do things, not just suggest them
Accessibility tools — automate repetitive web tasks for users with disabilities

The fact that it’s MIT-licensed and runs on consumer hardware means we’ll see an explosion of applications built on top of it.

FAQ

Can Fara-7B replace Selenium/Puppeteer for web scraping?

For structured scraping, traditional tools are still faster and more reliable. Fara-7B shines for tasks that require understanding context, making decisions, and handling dynamic UIs — things that break brittle CSS selectors.

How much VRAM does it need?

About 16GB for bf16 inference (fits on an RTX 4090 or A6000). With quantization (Q4), it can run on 8GB VRAM. See our local setup guide for quantized options.

Is it safe to let it browse the internet unsupervised?

Microsoft recommends sandboxed environments, human-in-the-loop for sensitive actions, and URL allow-lists. The model has built-in critical points where it stops for confirmation, but you should still run it in a controlled environment.

How does it compare to Anthropic Computer Use?

See our detailed comparison — the short version: Fara-7B is free, local, and open-source but less capable than Claude’s Computer Use on complex tasks.

How to Run Fara-7B Locally
Fara-7B vs Anthropic Computer Use vs OpenAI Operator
Claude Computer Use Guide
Best AI Coding Agents 2026
How to Run AI Models Locally

What is Microsoft Fara-7B? The First Open-Source Computer Use Agent

What Fara-7B does

Key specs

How it compares to larger models

What makes it different

What it can do

What it can’t do (yet)

The FaraGen data pipeline

How to get started

Why this matters

FAQ

Can Fara-7B replace Selenium/Puppeteer for web scraping?

How much VRAM does it need?

Is it safe to let it browse the internet unsupervised?

How does it compare to Anthropic Computer Use?

Related articles

📬 AI Dev Weekly

You might also like

Fara-7B vs Anthropic Computer Use vs OpenAI Operator — Which AI Agent Should You Use?

How to Run Microsoft Fara-7B Locally — Complete Setup Guide

Best Open-Source OCR Models 2026 (Compared)

Aion 1.0: Microsoft's On-Device AI Models for Windows (2026)