Scrapingquickstart
Scrapingquickstart
target
domain
url
headless
query
parse
num_pages
device_type locale
“price_upper”: float
“manufacturer”: “string”
“color”: “string”
“seller_id”: “string”
“customer_reviews”: “string”
“brand”: “string”
1. About Smartproxy 3
• Pricing 28
• Authentication methods 28
• API playground 29
• Requests usage 30
6. Resources
• GitHub 32
• Postman collections 32
7. Conclusion 33
1.
About Smartproxy
Smartproxy’s data collection infrastructure helps you effortlessly extract
web data from even the most challenging targets. Our products come with
award-winning 24/7 support, intuitive self-service dashboard, and flexible
pricing plans.
2.
Introduction to
Scraping APIs
Our Scraping APIs are designed to simplify real-time data collection at
scale. They lift the burden of managing proxies, running headless browsers,
and overcoming bot detection systems. With a single API call, you can get
structured data from the biggest search engines, social media platforms, and
eCommerce stores, or raw HTML from any website anywhere in the world.
These APIs are highly scalable and charge only for successful requests,
making your expenses predictable. If needed, you can even integrate them in
place of a proxy server with very few adjustments.
Proxies vs Scraping APIs
Smartproxy’s product family consists of proxies and web scraping APIs. You
may be wondering how the two compare. The table below explains their main
features:
Proxy servers
Automated website
unblocking
CAPTCHA solving
JavaScript rendering
Data parsing
Main features
Parameters
Web Scraping API accepts the following parameters. Most of them are
optional. The only obligatory parameter is url.
target string Should be always set to *universal* for Web Scraping API
{
“results”: [
{
“content”: “<html> page content here</html>”
“status_code”: integer,
“url”: “string”,
“task_id”: “string”,
“created_at”: “string”,
“updated_at”: “string”
}
]
}
Authentication methods
All our APIs support two integration methods: real-time and proxy-like. Both
return data over an open connection, meaning that you send a request and
wait for the response.
• Real-time
This is the main integration method. It lets you send POST requests to
the API endpoint with parameters in a JSON payload. This way, you can
specify data sources (such as Google Search) instead of providing the
full URL.
• Proxy-like
This method lets you integrate the APIs as a proxy server. It’s useful when
your infrastructure is based on the proxy format, or you’re transitioning
from proxies. The method requires passing a full URL with parameters in
the request headers.
The two tables below show the possible response codes you may encounter
while using the APIs.
204 - No content Job not completed yet. Wait a few seconds before trying again.
Your account does not have Make sure your Google target is supported
403 - Forbidden
access to this resource. by us
404 - Not found Your target was not found. Re-check your targeted URL.
Response Description
12000 - Success Server has replied and given the requested response.
12004 - Response not full Some fields were not parsed and are missing.
target
real-time google_search Request {
“results”: [...]
query
proxy-like }
adidas API
JSON
Main features
• Localized results with country, state, city, and zip code targeting
• Parsing capabilities for various Google data types like search results, ads,
and Shopping
Main targets
• Google Ads*
• Google Hotels
• Google Images
• Google Suggest
• Google Trends
* parsable
Parameters
SERP Scraping API accepts the following parameters. Most of them are
optional. The only obligatory parameters are target and url if you’re entering a
link directly, or target and query.
{
“results”: [
{
“content”: “<html> page content here</html>”
“status_code”: 200,
“url”: “string”,
“task_id”: “string”,
“created_at”: “string”,
“updated_at”: “string”
}
]
}
eCommerce
Scraping API
eCommerce Scraping API lets you scrape Amazon
and Wayfair by entering a URL or sending the
query as a parameter. It returns data in HTML or, in
the case of Amazon, parsed JSON.
target
real-time amazon Request {
“results”: [...]
query
proxy-like }
iPhone 13 API
JSON
Main features
• Option to enter a search query or item code as a parameter for easier use
• Parsing capabilities for various Amazon data types like search results,
product pages, and reviews
Main targets
• Amazon sellers*
* parsable
Parameters
True’ will return parsed output in JSON format. Leave blank for
parse boolean
HTML – not all data sources can be parsed.
Amazon Pricing
{
“results”: [
{
“content”: {
“url”: “string”,
“asin”: “string”,
“page”: integer,
“title”: “string”,
“pricing”: [
{
“price”: float,
“seller”: “string”,
“currency”: “string”,
“delivery”: “string”,
“condition”: “string”,
“seller_id”: “string”,
“seller_link”: “string”,
“rating_count”: integer,
“price_shipping”: float,
“delivery_options”: []
},
],
“asin_in_url”: “string”,
“review_count”: integer,
“parse_status_code”: 12000
}
}
]
}
Social Media
Scraping API
Social Media Scraping API lets you scrape
Instagram and TikTok by entering a URL or sending
the query as a parameter. It returns data in HTML
or parsed JSON.
target
real-time instagram Request {
“results”: [...]
Search URL
proxy-like }
instagram_url API JSON
Main features
TikTok
{
“data”: {
“content”: {
“nickname”: “string”,
“verified”: boolean,
“avatarThumb”: “string”,
“openFavorite”: boolean,
“ttSeller”: boolean,
“postInfo”: {
“id”: “string”,
“description”: “string”,
“postedAtTimestamp”: integer,
“postedAt”: “string”,
“author”: “string”,
“music”: {
“id”: “string”,
“title”: “string”,
“playUrl”: “string”,
“coverLarge”: “string”,
“coverMedium”: “string”,
“coverThumb”: “string”,
“authorName”: “string”,
“original”: boolean,
“duration”: integer,
“scheduleSearchTime”: integer
},
“shareCount”: integer,
“commentCount”: integer,
“playCount”: integer,
“accountLikes”: integer
}
},
“errors”: [],
“status_code”: integer
},
“task_id”: “string”,
“url”: “string”
}
Output example for
{
“data”: {
“content”: {
“user”: {
“biography”: “string”,
“bio_links”: [
{
“title”: “string”,
“lynx_url”: “string”,
“url”: “string”,
“link_type”: “string”
}
],
“biography_with_entities”: {
“raw_text”: “string”,
“entities”: []
},
“blocked_by_viewer”: boolean,
“restricted_by_viewer”: boolean,
“country_block”: boolean,
“external_url”: “string”,
“external_url_linkshimmed”: “string”,
“edge_followed_by”: {
“count”: integer
},
“fbid”: “string”,
“followed_by_viewer”: boolean,
“edge_follow”: {
“count”: 1111
},
“follows_viewer”: boolean,
“full_name”: “string”,
“group_metadata”: “string”,
“has_ar_effects”: boolean,
“has_clips”: boolean,
“has_guides”: boolean,
“has_channel”: boolean,
“has_blocked_viewer”: boolean,
“highlight_reel_count”: integer,
“has_requested_viewer”: boolean,
“hide_like_and_view_counts”: boolean,
“id”: “string”,
“is_business_account”: boolean,
“is_professional_account”: boolean,
5.
Overview and
integrations
Each scraping API has its own section in the dashboard. There, you can
manage your subscription, set up the API, and track usage statistics.
Pricing
Authentication method
GitHub
Postman collections
We hope that you’ve found this guide helpful. We’d love to talk to you about
how Smartproxy’s web scraping APIs can support your organization. You can
book a call with us.