Spider

Website | Guides | API Docs | Chat

A web crawler and scraper written in Rust.

Concurrent crawling with streaming
HTTP, Chrome (CDP), or WebDriver rendering
Caching, proxies, and distributed crawling

Features

Core

Concurrent & streaming crawls
Decentralized crawling for horizontal scaling
Caching (memory, disk, or hybrid)
Proxy support with rotation
Cron job scheduling

Browser Automation

Chrome DevTools Protocol (CDP) for local Chrome
WebDriver support for Selenium Grid, remote browsers, and cross-browser testing
AI-powered automation workflows
Web challenge solving (deterministic + AI built-in)

Data Processing

HTML transformations
CSS/XPath scraping with spider_utils
Smart mode for JS-rendered content detection

Security & Control

Anti-bot mitigation
Ad blocking
Firewall
Blacklisting, whitelisting, and depth budgeting
Spider Cloud integration for proxy rotation and anti-bot bypass (spider_cloud feature)

AI Agent

spider_agent - Concurrent-safe multimodal agent for web automation and research
Multiple LLM providers (OpenAI, OpenAI-compatible APIs)
Multiple search providers (Serper, Brave, Bing, Tavily)
HTML extraction and research synthesis

Quick Start

Add spider to your project:

[dependencies]
spider = "2"

Streaming Pages

Process pages as they're crawled with real-time subscriptions:

use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://spider.cloud");
    let mut rx = website.subscribe(0).unwrap();

    tokio::spawn(async move {
        while let Ok(page) = rx.recv().await {
            println!("- {}", page.get_url());
        }
    });

    website.crawl().await;
    website.unsubscribe();
}

Chrome (CDP)

Render JavaScript-heavy pages with stealth mode and request interception:

[dependencies]
spider = { version = "2", features = ["chrome"] }

use spider::features::chrome_common::RequestInterceptConfiguration;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://spider.cloud")
        .with_chrome_intercept(RequestInterceptConfiguration::new(true))
        .with_stealth(true)
        .build()
        .unwrap();

    website.crawl().await;
}

WebDriver (Selenium Grid)

Connect to remote browsers, Selenium Grid, or any W3C WebDriver-compatible service:

[dependencies]
spider = { version = "2", features = ["webdriver"] }

use spider::features::webdriver_common::{WebDriverConfig, WebDriverBrowser};
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://spider.cloud")
        .with_webdriver(
            WebDriverConfig::new()
                .with_server_url("http://localhost:4444")
                .with_browser(WebDriverBrowser::Chrome)
                .with_headless(true)
        )
        .build()
        .unwrap();

    website.crawl().await;
}

Spider Cloud: Reliable Crawling at Scale

Production crawling means dealing with bot protection, CAPTCHAs, rate limits, and blocked requests. Spider Cloud integration adds a reliability layer that handles all of this automatically — no code changes required beyond adding your API key.

New to Spider Cloud? Sign up at spider.cloud to get your API key. New accounts receive free credits so you can try it out before committing.

Enable the feature:

[dependencies]
spider = { version = "2", features = ["spider_cloud"] }

How It Works

When you provide a Spider Cloud API key, your crawler gains access to:

Managed proxy rotation — requests route through proxy.spider.cloud with automatic IP rotation, geo-targeting, and residential proxies
Anti-bot bypass — Cloudflare, Akamai, Imperva, Distil Networks, and generic CAPTCHA challenges are handled transparently
Automatic fallback — if a direct request fails (403, 429, 503, 5xx), the request is retried through Spider Cloud's unblocking infrastructure
Content-aware detection — Smart mode inspects response bodies for challenge pages, empty responses, and bot detection markers before you ever see them

Integration Modes

Choose the mode that fits your workload:

Mode	Strategy	Best For
Proxy (default)	Route all traffic through Spider Cloud proxy	General crawling with proxy rotation
Smart (recommended)	Proxy by default, auto-fallback to unblocker on bot detection	Production workloads — best balance of speed and reliability
Fallback	Direct fetch first, fall back to API on failure	Cost-efficient crawling where most sites work without help
Unblocker	All requests through the unblocker API	Sites with aggressive bot protection
Api	All requests through the crawl API	Simple scraping, one page at a time

Smart mode is the recommended choice for production. It detects and handles:

HTTP 403, 429, 503, and Cloudflare 520-530 errors
Cloudflare browser verification challenges
CAPTCHA and "verify you are human" pages
Distil Networks, Imperva, and Akamai Bot Manager
Empty response bodies on HTML pages

Quick Setup

One line to enable proxy routing:

use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com")
        .with_spider_cloud("your-api-key")  // Proxy mode (default)
        .build()
        .unwrap();

    website.crawl().await;
}

Smart Mode (Recommended)

For production, use Smart mode to get automatic fallback when pages are protected:

use spider::configuration::{SpiderCloudConfig, SpiderCloudMode};
use spider::website::Website;

#[tokio::main]
async fn main() {
    let config = SpiderCloudConfig::new("your-api-key")
        .with_mode(SpiderCloudMode::Smart);

    let mut website = Website::new("https://protected-site.com")
        .with_spider_cloud_config(config)
        .build()
        .unwrap();

    website.crawl().await;
}

What happens under the hood in Smart mode:

Request goes through proxy.spider.cloud (fast, low cost)
If the response is a 403/429/503, a challenge page, or an empty body → automatic retry through the /unblocker API
The unblocked content is returned transparently — your code sees a normal page

CLI Usage

spider --url https://example.com \
  --spider-cloud-key "your-api-key" \
  --spider-cloud-mode smart

Extra Parameters

Pass additional options to the Spider Cloud API for fine-grained control:

use spider::configuration::{SpiderCloudConfig, SpiderCloudMode};

let mut params = hashbrown::HashMap::new();
params.insert("stealth".into(), serde_json::json!(true));
params.insert("fingerprint".into(), serde_json::json!(true));

let config = SpiderCloudConfig::new("your-api-key")
    .with_mode(SpiderCloudMode::Smart)
    .with_extra_params(params);

Get started at spider.cloud — new signups receive free credits to test the full integration.

Get Spider

Method	Best For
Spider Cloud	Production workloads, no setup required
spider	Rust applications
spider_agent	AI-powered web automation and research
spider_cli	Command-line usage
spider-nodejs	Node.js projects
spider-py	Python projects

Resources

Examples - Code samples for common use cases
Benchmarks - Performance comparisons
Changelog - Version history

License

MIT

Contributing

See CONTRIBUTING.

Name		Name	Last commit message	Last commit date
Latest commit History 1,662 Commits
.github/workflows		.github/workflows
benches		benches
examples		examples
spider		spider
spider_agent		spider_agent
spider_agent_html		spider_agent_html
spider_agent_types		spider_agent_types
spider_cli		spider_cli
spider_utils		spider_utils
spider_worker		spider_worker
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
default.nix		default.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spider

Features

Core

Browser Automation

Data Processing

Security & Control

AI Agent

Quick Start

Streaming Pages

Chrome (CDP)

WebDriver (Selenium Grid)

Spider Cloud: Reliable Crawling at Scale

How It Works

Integration Modes

Quick Setup

Smart Mode (Recommended)

CLI Usage

Extra Parameters

Get Spider

Resources

License

Contributing

About

Uh oh!

Releases 157

Packages

Used by 128

Contributors 31

Uh oh!

Languages

License

spider-rs/spider

Folders and files

Latest commit

History

Repository files navigation

Spider

Features

Core

Browser Automation

Data Processing

Security & Control

AI Agent

Quick Start

Streaming Pages

Chrome (CDP)

WebDriver (Selenium Grid)

Spider Cloud: Reliable Crawling at Scale

How It Works

Integration Modes

Quick Setup

Smart Mode (Recommended)

CLI Usage

Extra Parameters

Get Spider

Resources

License

Contributing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 157

Packages 0

Used by 128

Contributors 31

Uh oh!

Languages

Packages