0% found this document useful (0 votes)
9 views

12.Using Proxies With Python Selenium _ ScrapeOps

This document provides a guide on how to integrate proxies into Python Selenium for web scraping. It covers the integration of simple HTTP proxies, authenticated proxies using Selenium Wire, and the use of proxy APIs, detailing the necessary code snippets for both Chrome and Firefox browsers. Additionally, it emphasizes the importance of using proxy port integration over API endpoints for better functionality with headless browsers.

Uploaded by

Khánh Cao Minh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

12.Using Proxies With Python Selenium _ ScrapeOps

This document provides a guide on how to integrate proxies into Python Selenium for web scraping. It covers the integration of simple HTTP proxies, authenticated proxies using Selenium Wire, and the use of proxy APIs, detailing the necessary code snippets for both Chrome and Firefox browsers. Additionally, it emphasizes the importance of using proxy port integration over API endpoints for better functionality with headless browsers.

Uploaded by

Khánh Cao Minh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

3/31/24, 9:33 PM Using Proxies With Python Selenium | ScrapeOps

Using Proxies With Python Selenium


Selenium is a powerful browser automation library that allows you to build bots and scrapers that can
load and interact with web pages in the browser. As a result, Selenium is very popular amongst the
Python web scraping community.

In this guide for The Python Selenium Web Scraping Playbook, we will look at how to integrate proxies
into our Python Selenium based web scraper.

There are number of different types of proxies which you need to integrate differently with Selenium, so
we will walk through how to integrate each type:

Using Proxies With Selenium


Using Authenticated Proxies With Selenium
Integrating Proxy APIs

Using Proxies With Selenium


The first and simplest type of proxy to integrate with Python Selenium are simple HTTP proxies (in the
form of a IP address) that don't require authentication. For example:

https://scrapeops.io/selenium-web-scraping-playbook/python-selenium-proxy/ 1/6
3/31/24, 9:33 PM Using Proxies With Python Selenium | ScrapeOps

"11.456.448.110:8080"

Depending on which type of browser you are using the integration method is slightly different.

Integrating Proxy With Selenium Chrome Browser


To integrate this proxy IP into a Selenium scraper that uses a Chrome Browser we just need to set the --
proxy-server arguement in our WebDriver options:

from selenium import webdriver

## Example Proxy
PROXY = "11.456.448.110:8080"

## Create WebDriver Options to Add Proxy


chrome_options = WebDriver.ChromeOptions()
chrome_options.add_argument(f'--proxy-server={PROXY}')
chrome = webdriver.Chrome(chrome_options=chrome_options)

## Make Request Using Proxy


chrome.get("http://httpbin.org/ip")

Now when we run the script we can see that Selenium is using the defined proxy IP:

{
"origin": "11.456.448.110:8080"
}

Integrating Proxy With Selenium Firefox Browser


To integrate this proxy IP into a Selenium scraper that uses a FireFox Browser we need to use the Proxy
and ProxyType classes from the Selenium Webdriver library:

from selenium import webdriver


from selenium.webdriver.common.proxy import Proxy, ProxyType

## Define Proxy
proxy = Proxy({
'proxyType': ProxyType.MANUAL,
'httpProxy': "11.456.448.110:8080",

https://scrapeops.io/selenium-web-scraping-playbook/python-selenium-proxy/ 2/6
3/31/24, 9:33 PM Using Proxies With Python Selenium | ScrapeOps

'noProxy': ''
})

## Create Driver
firefox_driver = webdriver.Firefox(proxy = proxy, executable_path=r"/root/geckodriver")

## Make Request Using Proxy


firefox_driver.get("http://httpbin.org/ip")

Now when we run the script we can see that Selenium is using the defined proxy IP:

{
"origin": "11.456.448.110:8080"
}

HTTP PROXY AUTHENTICATION

This method works fine when you don't need to add an authentication username and password to
the proxy. We will look at how to use authenticated proxies in another section.

Using Authenticated Proxies With Selenium


The above method doesn't work if you need to use proxies that require username and password
authentication.

It is very common for commercial proxy providers to sell access to their proxy pools by giving you single
proxy endpoint that you send your requests too and authenticate your account using a username and
password .

"http://USERNAME:PASSWORD@proxy-server:8080"

There are a couple ways to solve this, but one of the easiest is to use the Selenium Wire extension which
makes it very easy to use proxies with Selenium.

First, you need to install Selenium Wire using pip:

pip install selenium-wire

https://scrapeops.io/selenium-web-scraping-playbook/python-selenium-proxy/ 3/6
3/31/24, 9:33 PM Using Proxies With Python Selenium | ScrapeOps

Then update your scraper to use the seleniumwire webdriver instead of the default selenium
webdriver :

from seleniumwire import webdriver


from webdriver_manager.chrome import ChromeDriverManager

## Define Your Proxy Endpoints


proxy_options = {
'proxy': {
'http': 'http://USERNAME:PASSWORD@proxy-server:8080',
'https': 'http://USERNAME:PASSWORD@proxy-server:8080',
'no_proxy': 'localhost:127.0.0.1'
}
}

## Set Up Selenium Chrome driver


driver = webdriver.Chrome(ChromeDriverManager().install(),
seleniumwire_options=proxy_options)

## Send Request Using Proxy


driver.get('http://httpbin.org/ip')

Now when we run the script we can see that Selenium is using a proxy IP:

{
"origin": "201.88.548.330:8080"
}

Selenium Wire has a lot of other powerful functionality, so if you would like to learn more then check out
our full Selenium Wire guide here.

Integrating Proxy APIs


Over the last few years there has been a huge surge in proxy providers that offer smart proxy solutions
that handle all the proxy rotation, header selection, ban detection and retries on their end. These smart
APIs typically provide their proxy services in a API endpoint format.

However, these proxy API endpoints don't integrate well with headless browsers when the website is
using relative links as Selenium will try to attach the relative URL onto the proxy API endpoint not the
websites root URL. Resulting, in some pages not loading correctly.

https://scrapeops.io/selenium-web-scraping-playbook/python-selenium-proxy/ 4/6
3/31/24, 9:33 PM Using Proxies With Python Selenium | ScrapeOps

As a result, when integrating your Selenium scrapers it is recommended that you use their proxy port
integration over the API endpoint integration when they provide them (not all do have a proxy port
integration).

For example, in the case of the ScrapeOps Proxy Aggregator we offer a proxy port integration for
situations like this.

The proxy port integration is a light front-end for the API and has all the same functionality and
performance as sending requests to the API endpoint but allow you to integrate our proxy aggregator
as you would with any normal proxy.

The following is an example of how to integrate the ScrapeOps Proxy Aggregator into your Selenium
scraper using

from seleniumwire import webdriver


from webdriver_manager.chrome import ChromeDriverManager

SCRAPEOPS_API_KEY = 'APIKEY'

## Define ScrapeOps Proxy Port Endpoint


proxy_options = {
'proxy': {
'http': f'http://scrapeops:{SCRAPEOPS_API_KEY}@proxy.scrapeops.io:5353',
'https': f'http://scrapeops:{SCRAPEOPS_API_KEY}@proxy.scrapeops.io:5353',
'no_proxy': 'localhost:127.0.0.1'
}
}

## Set Up Selenium Chrome driver


driver = webdriver.Chrome(ChromeDriverManager().install(),
seleniumwire_options=proxy_options)

## Send Request Using ScrapeOps Proxy


driver.get('http://quotes.toscrape.com/')

Full integration docs for Python Selenium and the ScrapeOps Proxy Aggregator can be found here.

TIP

To use the ScrapeOps Proxy Aggregator, you first need an API key which you can get by signing up
for a free account here which gives you 1,000 free API credits.

https://scrapeops.io/selenium-web-scraping-playbook/python-selenium-proxy/ 5/6
3/31/24, 9:33 PM Using Proxies With Python Selenium | ScrapeOps

More Web Scraping Tutorials


So that's how you can use both authenticated and unauthenticated proxies with Selenium to scrape
websites without getting blocked.

If you would like to learn more about Web Scraping with Selenium, then be sure to check out The
Selenium Web Scraping Playbook.

Or check out one of our more in-depth guides:

Selenium Undetected Chromedriver Guide: Bypass Anti-Bots With Ease


How to Scrape The Web Without Getting Blocked Guide
The Ethics of Web Scraping

https://scrapeops.io/selenium-web-scraping-playbook/python-selenium-proxy/ 6/6

You might also like