Bots, LLMs, and Automated Access#
This guide covers best practices for bots, AI agents, LLMs, and other automated tools interacting with Internet Archive services and APIs.
AI Coding Assistants#
For AI coding assistants like Claude Code, Cursor, Windsurf, and similar tools, we provide ready-to-use skills and documentation:
internetarchive/internet-archive-skills
This repository contains skills that enable AI assistants to properly interact with archive.org APIs, including uploading, downloading, searching, and metadata operations.
User-Agent Requirements#
All automated requests to archive.org must include a descriptive User-Agent header that identifies:
The tool or bot name
Version number
For AI agents: the model being used (e.g.,
claude-sonnet-4-20250514)
This helps the Internet Archive track usage patterns, troubleshoot issues, and maintain service quality.
ia CLI (v5.7.2+)#
Use the --user-agent-suffix option:
ia --user-agent-suffix "MyBot/1.0.0 (claude-sonnet-4-20250514)" download my-item
Or set it permanently in ~/.config/internetarchive/ia.ini:
[general]
user_agent_suffix = MyBot/1.0.0 (claude-sonnet-4-20250514)
Python Library#
Pass the suffix when creating a session:
from internetarchive import get_session
session = get_session(config={
'general': {'user_agent_suffix': 'MyBot/1.0.0 (claude-sonnet-4-20250514)'}
})
# Use session for all operations
item = session.get_item('example-item')
Or configure it in your ia.ini file as shown above.
Direct HTTP Requests#
When making direct API calls with curl, requests, or other HTTP clients:
curl -H "User-Agent: MyBot/1.0.0 (claude-sonnet-4-20250514)" \
https://archive.org/metadata/example-item
import requests
headers = {
'User-Agent': 'MyBot/1.0.0 (claude-sonnet-4-20250514)'
}
response = requests.get('https://archive.org/metadata/example-item', headers=headers)
Resulting User-Agent#
When using the ia CLI or Python library, your suffix is appended to the default User-Agent:
internetarchive/5.7.2 (Linux x86_64; N; en; ACCESS_KEY) Python/3.11.0 MyBot/1.0.0 (claude-sonnet-4-20250514)
Rate Limiting#
Automated tools should respect rate limits:
Add delays between requests for bulk operations
Honor
429 Too Many Requestsresponses andRetry-AfterheadersUse
--checksumflags to avoid re-downloading/re-uploading unchanged filesConsider using GNU Parallel with
-jto limit concurrent requests
# Limit to 4 concurrent downloads with 1 second delay
cat items.txt | parallel -j4 --delay 1 'ia download {}'
Best Practices#
Identify your bot clearly - Use descriptive User-Agent strings
Authenticate when possible - Configure
iawith your credentialsCache responses - Avoid repeated requests for the same data
Use bulk endpoints - Prefer batch operations over many individual requests
Test in test_collection - Validate uploads before committing to permanent collections
Handle errors gracefully - Implement retries with exponential backoff