🌐 Browser-Use

Make websites accessible for AI agents — in TypeScript

A TypeScript-first library for building AI-powered web agents that can autonomously browse, interact with, and extract data from the web using LLMs and Playwright.

TypeScript port of the popular Python browser-use library — with a native Node.js experience, full type safety, and first-class support for all major LLM providers.

✨ Features

🤖 Autonomous Browser Control — AI-driven navigation, clicking, typing, form filling, scrolling, and tab management
🧠 10+ LLM Providers — OpenAI, Anthropic, Google Gemini, Azure, AWS Bedrock, Groq, Ollama, DeepSeek, OpenRouter, Mistral, Cerebras, and custom providers
👁️ Vision Support — Screenshot-based understanding for visual web interactions
🔧 45+ Built-in Actions — Navigation, element interaction, scrolling, forms, tabs, content extraction, file I/O, and more
🧩 Custom Actions — Extensible registry with Zod schema validation, domain restrictions, and page filters
🔌 MCP Server — Model Context Protocol support for Claude Desktop and MCP-compatible clients
⌨️ CLI Tool — Interactive and one-shot modes for quick browser tasks
🔒 Security First — Sensitive data masking, domain restrictions, and Chromium sandboxing
📊 Observability — Event system, telemetry, performance tracing, and session recording (GIF)
🐳 Docker Ready — Configurable for containerized and CI/CD environments

🚀 Quick Start

Installation

npm install browser-use
# Playwright browsers are installed automatically via postinstall

Set Up Your API Key

export OPENAI_API_KEY=sk-your-api-key
# or ANTHROPIC_API_KEY, GOOGLE_API_KEY, etc.

Run Your First Agent

import { Agent } from 'browser-use';
import { ChatOpenAI } from 'browser-use/llm/openai';

const agent = new Agent({
  task: 'Go to google.com and search for "TypeScript tutorials"',
  llm: new ChatOpenAI({
    model: 'gpt-4o',
    apiKey: process.env.OPENAI_API_KEY,
  }),
});

const history = await agent.run();
console.log('Result:', history.final_result());
console.log('Success:', history.is_successful());

npx tsx example.ts

Use the CLI

# Interactive mode
npx browser-use

# One-shot task
npx browser-use "Go to example.com and extract the page title"

# With specific model
npx browser-use --model claude-sonnet-4-20250514 -p "Search for AI news"

# Headless mode
npx browser-use --headless -p "Check the weather"

# MCP server mode
npx browser-use --mcp

🏗️ Architecture

┌─────────────────────────────────────────────────────┐
│                    Browser-Use                       │
├─────────────────────────────────────────────────────┤
│  Agent ← MessageManager ← LLM Providers            │
│    ↓                                                 │
│  Controller → Action Registry → BrowserSession      │
│                                      ↓               │
│                                  DomService          │
└─────────────────────────────────────────────────────┘

Component	Description
Agent	Central orchestrator — runs the observe → think → act loop
Controller	Manages action registration and execution via Registry
BrowserSession	Playwright wrapper — browser lifecycle, tab management, screenshots
DomService	Extracts interactive elements with indexed mapping for LLM consumption
MessageManager	Manages LLM conversation history with token optimization
LLM Providers	Unified `BaseChatModel` interface across 10+ providers

How It Works

Agent receives a natural language task
DomService extracts the current page state (interactive elements + optional screenshot)
LLM analyzes the state and returns actions to take
Controller validates and executes actions through the Registry
Results feed back to the LLM for the next step
Loop continues until done action or max_steps

🔌 LLM Providers

Provider	Import	Vision	Notes
OpenAI	`browser-use/llm/openai`	✅	Default provider, reasoning models (o1/o3/o4)
Anthropic	`browser-use/llm/anthropic`	✅	Prompt caching support
Google Gemini	`browser-use/llm/google`	✅	Extended thinking support
Azure OpenAI	`browser-use/llm/azure`	✅	Enterprise deployment
AWS Bedrock	`browser-use/llm/aws`	✅	Claude via AWS
Groq	`browser-use/llm/groq`	❌	Fastest inference
Ollama	`browser-use/llm/ollama`	❌	Local/self-hosted models
DeepSeek	`browser-use/llm/deepseek`	❌	Cost-effective
OpenRouter	`browser-use/llm/openrouter`	Varies	Multi-model routing
Mistral	`browser-use/llm/mistral`	Varies	Mistral models
Cerebras	`browser-use/llm/cerebras`	❌	Fast inference

Provider examples

// OpenAI
import { ChatOpenAI } from 'browser-use/llm/openai';
const llm = new ChatOpenAI({
  model: 'gpt-4o',
  apiKey: process.env.OPENAI_API_KEY,
});

// Anthropic
import { ChatAnthropic } from 'browser-use/llm/anthropic';
const llm = new ChatAnthropic({
  model: 'claude-sonnet-4-20250514',
  apiKey: process.env.ANTHROPIC_API_KEY,
});

// Google Gemini
import { ChatGoogle } from 'browser-use/llm/google';
const llm = new ChatGoogle('gemini-2.5-flash');

// Ollama (local)
import { ChatOllama } from 'browser-use/llm/ollama';
const llm = new ChatOllama('llama3', 'http://localhost:11434');

// OpenAI Reasoning Models
const llm = new ChatOpenAI({ model: 'o3-mini', reasoningEffort: 'medium' });

🎯 Code Examples

Data Extraction

const agent = new Agent({
  task: `Go to amazon.com, search for "wireless keyboard",
         extract the name, price, and rating of the first 5 products as JSON`,
  llm,
  use_vision: true,
});

const history = await agent.run(30);
console.log(history.final_result());

Form Filling with Sensitive Data

const agent = new Agent({
  task: 'Login to the dashboard',
  llm,
  sensitive_data: {
    '*.example.com': {
      username: process.env.SITE_USERNAME!,
      password: process.env.SITE_PASSWORD!,
    },
  },
  browser_session: new BrowserSession({
    browser_profile: new BrowserProfile({
      allowed_domains: ['*.example.com'],
    }),
  }),
});

Custom Actions

import { Controller, ActionResult } from 'browser-use';
import { z } from 'zod';

const controller = new Controller();

controller.registry.action('Save screenshot to file', {
  param_model: z.object({
    filename: z.string().describe('Output filename'),
  }),
})(async function save_screenshot(params, ctx) {
  const screenshot = await ctx.page.screenshot();
  fs.writeFileSync(`./screenshots/${params.filename}`, screenshot);
  return new ActionResult({
    extracted_content: `Screenshot saved as ${params.filename}`,
  });
});

const agent = new Agent({ task: '...', llm, controller });

Vision Mode & Session Recording

const agent = new Agent({
  task: 'Navigate to hacker news and summarize the top stories',
  llm,
  use_vision: true,
  vision_detail_level: 'high', // 'auto' | 'low' | 'high'
  generate_gif: './session.gif',
});

Multi-Tab Workflows

const agent = new Agent({
  task: `Compare "Sony WH-1000XM5" prices:
    1. Open amazon.com and search for the product
    2. Open bestbuy.com in a new tab and search
    3. Provide a comparison summary`,
  llm,
  use_vision: true,
});

Event System

const agent = new Agent({ task: '...', llm });

agent.eventbus.on('CreateAgentStepEvent', (event) => {
  console.log('Step completed:', event.step_id);
});

await agent.run();

⚙️ Configuration

Agent Options

const agent = new Agent({
  task: 'Your task',
  llm,
  use_vision: true, // Enable screenshot analysis
  max_actions_per_step: 5, // Actions per LLM call
  max_failures: 3, // Max retries on failure
  generate_gif: './recording.gif', // Session recording
  validate_output: true, // Strict output validation
  use_thinking: true, // Extended thinking prompts
  llm_timeout: 60, // LLM call timeout (seconds)
  step_timeout: 180, // Step timeout (seconds)
  extend_system_message: 'Be concise', // Custom prompt additions
});

const history = await agent.run(50); // Max 50 steps

Browser Profile

import { BrowserProfile, BrowserSession } from 'browser-use';

const profile = new BrowserProfile({
  headless: true,
  viewport: { width: 1920, height: 1080 },
  user_data_dir: './my-profile', // Persistent sessions
  allowed_domains: ['*.example.com'], // Domain restrictions
  highlight_elements: true, // Visual debugging
  proxy: { server: 'http://proxy:8080' },
});

const session = new BrowserSession({ browser_profile: profile });
const agent = new Agent({ task: '...', llm, browser_session: session });

Environment Variables

Variable	Description
`OPENAI_API_KEY`	OpenAI API key
`ANTHROPIC_API_KEY`	Anthropic API key
`GOOGLE_API_KEY`	Google API key
`BROWSER_USE_HEADLESS`	Run browser headlessly (`true`/`false`)
`BROWSER_USE_LOGGING_LEVEL`	Log level: `debug`, `info`, `warning`, `error`
`BROWSER_USE_ALLOWED_DOMAINS`	Comma-separated domain allowlist
`ANONYMIZED_TELEMETRY`	Enable/disable anonymous telemetry

See Configuration Guide for the full list.

🔌 MCP Server (Claude Desktop)

Browser-Use can run as an MCP server, exposing browser automation as tools for Claude Desktop:

npx browser-use --mcp

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "browser-use": {
      "command": "npx",
      "args": ["browser-use", "--mcp"],
      "env": {
        "OPENAI_API_KEY": "your-api-key"
      }
    }
  }
}

Available MCP tools: browser_run_task, browser_navigate, browser_click, browser_type, browser_scroll, browser_get_state, browser_extract, browser_screenshot, browser_close.

See MCP Server Guide for more details.

🔒 Security

Sensitive Data Masking — Credentials are automatically masked in logs and LLM context
Domain Restrictions — Lock browser navigation to trusted domains
Domain-scoped Secrets — Credentials are only injected on matching domains
Hard Safety Gate — sensitive_data requires allowed_domains by default
Chromium Sandbox — Enabled by default for production security

const agent = new Agent({
  task: 'Login and fetch invoices',
  llm,
  sensitive_data: {
    '*.example.com': {
      username: process.env.USERNAME!,
      password: process.env.PASSWORD!,
    },
  },
  browser_session: new BrowserSession({
    browser_profile: new BrowserProfile({
      allowed_domains: ['*.example.com'],
    }),
  }),
});

See Security Guide for production deployment best practices.

📚 Documentation

Document	Description
Quick Start	Get started in 5 minutes
Architecture	System design and component overview
API Reference	Complete API documentation
Configuration	All configuration options
LLM Providers	Provider setup and comparison
Actions	Built-in and custom actions
MCP Server	MCP integration guide
Security	Security best practices
Examples	More code examples
Contributing	Contribution guidelines

🛠️ Development

# Install dependencies
pnpm install

# Build
pnpm build

# Run tests
pnpm test

# Lint & format
pnpm lint
pnpm prettier

# Type checking
pnpm typecheck

# Run an example
pnpm exec tsx examples/simple-search.ts

Requirements

Node.js >= 18.0.0
LLM API Key — At least one supported provider
Playwright — Installed automatically as a dependency

📄 License

MIT © Web LLM

Name		Name	Last commit message	Last commit date
Latest commit History 589 Commits
.github		.github
docs		docs
examples		examples
scripts		scripts
src		src
test		test
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
.nvmrc		.nvmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
tsconfig.test.json		tsconfig.test.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌐 Browser-Use

✨ Features

🚀 Quick Start

Installation

Set Up Your API Key

Run Your First Agent

Use the CLI

🏗️ Architecture

How It Works

🔌 LLM Providers

🎯 Code Examples

Data Extraction

Form Filling with Sensitive Data

Custom Actions

Vision Mode & Session Recording

Multi-Tab Workflows

Event System

⚙️ Configuration

Agent Options

Browser Profile

Environment Variables

🔌 MCP Server (Claude Desktop)

🔒 Security

📚 Documentation

🛠️ Development

Requirements

📄 License

About

Uh oh!

Releases 4

Packages

Languages

License

webllm/browser-use

Folders and files

Latest commit

History

Repository files navigation

🌐 Browser-Use

✨ Features

🚀 Quick Start

Installation

Set Up Your API Key

Run Your First Agent

Use the CLI

🏗️ Architecture

How It Works

🔌 LLM Providers

🎯 Code Examples

Data Extraction

Form Filling with Sensitive Data

Custom Actions

Vision Mode & Session Recording

Multi-Tab Workflows

Event System

⚙️ Configuration

Agent Options

Browser Profile

Environment Variables

🔌 MCP Server (Claude Desktop)

🔒 Security

📚 Documentation

🛠️ Development

Requirements

📄 License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages