Website | Guides | API Docs | Chat
A web crawler and scraper written in Rust.
- Concurrent crawling with streaming
- HTTP, Chrome (CDP), or WebDriver rendering
- Caching, proxies, and distributed crawling
- Concurrent & streaming crawls
- Decentralized crawling for horizontal scaling
- Caching (memory, disk, or hybrid)
- Proxy support with rotation
- Cron job scheduling
- Chrome DevTools Protocol (CDP) for local Chrome
- WebDriver support for Selenium Grid, remote browsers, and cross-browser testing
- AI-powered automation workflows
- Web challenge solving (deterministic + AI built-in)
- HTML transformations
- CSS/XPath scraping with spider_utils
- Smart mode for JS-rendered content detection
- Anti-bot mitigation
- Ad blocking
- Firewall
- Blacklisting, whitelisting, and depth budgeting
- Spider Cloud integration for proxy rotation and anti-bot bypass (
spider_cloudfeature)
- spider_agent - Concurrent-safe multimodal agent for web automation and research
- Multiple LLM providers (OpenAI, OpenAI-compatible APIs)
- Multiple search providers (Serper, Brave, Bing, Tavily)
- HTML extraction and research synthesis
Add spider to your project:
[dependencies]
spider = "2"Process pages as they're crawled with real-time subscriptions:
use spider::tokio;
use spider::website::Website;
#[tokio::main]
async fn main() {
let mut website = Website::new("https://spider.cloud");
let mut rx = website.subscribe(0).unwrap();
tokio::spawn(async move {
while let Ok(page) = rx.recv().await {
println!("- {}", page.get_url());
}
});
website.crawl().await;
website.unsubscribe();
}Render JavaScript-heavy pages with stealth mode and request interception:
[dependencies]
spider = { version = "2", features = ["chrome"] }use spider::features::chrome_common::RequestInterceptConfiguration;
use spider::website::Website;
#[tokio::main]
async fn main() {
let mut website = Website::new("https://spider.cloud")
.with_chrome_intercept(RequestInterceptConfiguration::new(true))
.with_stealth(true)
.build()
.unwrap();
website.crawl().await;
}Connect to remote browsers, Selenium Grid, or any W3C WebDriver-compatible service:
[dependencies]
spider = { version = "2", features = ["webdriver"] }use spider::features::webdriver_common::{WebDriverConfig, WebDriverBrowser};
use spider::website::Website;
#[tokio::main]
async fn main() {
let mut website = Website::new("https://spider.cloud")
.with_webdriver(
WebDriverConfig::new()
.with_server_url("http://localhost:4444")
.with_browser(WebDriverBrowser::Chrome)
.with_headless(true)
)
.build()
.unwrap();
website.crawl().await;
}Production crawling means dealing with bot protection, CAPTCHAs, rate limits, and blocked requests. Spider Cloud integration adds a reliability layer that handles all of this automatically — no code changes required beyond adding your API key.
New to Spider Cloud? Sign up at spider.cloud to get your API key. New accounts receive free credits so you can try it out before committing.
Enable the feature:
[dependencies]
spider = { version = "2", features = ["spider_cloud"] }When you provide a Spider Cloud API key, your crawler gains access to:
- Managed proxy rotation — requests route through
proxy.spider.cloudwith automatic IP rotation, geo-targeting, and residential proxies - Anti-bot bypass — Cloudflare, Akamai, Imperva, Distil Networks, and generic CAPTCHA challenges are handled transparently
- Automatic fallback — if a direct request fails (403, 429, 503, 5xx), the request is retried through Spider Cloud's unblocking infrastructure
- Content-aware detection — Smart mode inspects response bodies for challenge pages, empty responses, and bot detection markers before you ever see them
Choose the mode that fits your workload:
| Mode | Strategy | Best For |
|---|---|---|
| Proxy (default) | Route all traffic through Spider Cloud proxy | General crawling with proxy rotation |
| Smart (recommended) | Proxy by default, auto-fallback to unblocker on bot detection | Production workloads — best balance of speed and reliability |
| Fallback | Direct fetch first, fall back to API on failure | Cost-efficient crawling where most sites work without help |
| Unblocker | All requests through the unblocker API | Sites with aggressive bot protection |
| Api | All requests through the crawl API | Simple scraping, one page at a time |
Smart mode is the recommended choice for production. It detects and handles:
- HTTP 403, 429, 503, and Cloudflare 520-530 errors
- Cloudflare browser verification challenges
- CAPTCHA and "verify you are human" pages
- Distil Networks, Imperva, and Akamai Bot Manager
- Empty response bodies on HTML pages
One line to enable proxy routing:
use spider::website::Website;
#[tokio::main]
async fn main() {
let mut website = Website::new("https://example.com")
.with_spider_cloud("your-api-key") // Proxy mode (default)
.build()
.unwrap();
website.crawl().await;
}For production, use Smart mode to get automatic fallback when pages are protected:
use spider::configuration::{SpiderCloudConfig, SpiderCloudMode};
use spider::website::Website;
#[tokio::main]
async fn main() {
let config = SpiderCloudConfig::new("your-api-key")
.with_mode(SpiderCloudMode::Smart);
let mut website = Website::new("https://protected-site.com")
.with_spider_cloud_config(config)
.build()
.unwrap();
website.crawl().await;
}What happens under the hood in Smart mode:
- Request goes through
proxy.spider.cloud(fast, low cost) - If the response is a 403/429/503, a challenge page, or an empty body → automatic retry through the
/unblockerAPI - The unblocked content is returned transparently — your code sees a normal page
spider --url https://example.com \
--spider-cloud-key "your-api-key" \
--spider-cloud-mode smartPass additional options to the Spider Cloud API for fine-grained control:
use spider::configuration::{SpiderCloudConfig, SpiderCloudMode};
let mut params = hashbrown::HashMap::new();
params.insert("stealth".into(), serde_json::json!(true));
params.insert("fingerprint".into(), serde_json::json!(true));
let config = SpiderCloudConfig::new("your-api-key")
.with_mode(SpiderCloudMode::Smart)
.with_extra_params(params);Get started at spider.cloud — new signups receive free credits to test the full integration.
| Method | Best For |
|---|---|
| Spider Cloud | Production workloads, no setup required |
| spider | Rust applications |
| spider_agent | AI-powered web automation and research |
| spider_cli | Command-line usage |
| spider-nodejs | Node.js projects |
| spider-py | Python projects |
- Examples - Code samples for common use cases
- Benchmarks - Performance comparisons
- Changelog - Version history
See CONTRIBUTING.