0% found this document useful (0 votes)
35 views5 pages

3252 Ids 10

The document provides a comprehensive guide on web scraping using Python, focusing on libraries such as BeautifulSoup, requests, and Selenium. It explains the process of extracting data from websites and includes examples of making HTTP requests and parsing HTML content. Additionally, it highlights the importance of Python for web scraping and introduces essential packages and tools for effective data retrieval.

Uploaded by

nbkr115
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views5 pages

3252 Ids 10

The document provides a comprehensive guide on web scraping using Python, focusing on libraries such as BeautifulSoup, requests, and Selenium. It explains the process of extracting data from websites and includes examples of making HTTP requests and parsing HTML content. Additionally, it highlights the importance of Python for web scraping and introduces essential packages and tools for effective data retrieval.

Uploaded by

nbkr115
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Student ID:23kb1a3252

EXPERIMENT -X

Department of Computer Science and Engineering Page | 53


Student ID:23kb1a3252

Demonstrate Web Scraping Using Python

Description:

Web scraping is the process of extracting data from websites. In Python, web scraping can be performed
using libraries such as BeautifulSoup, requests, and Selenium. Here's a step-by-step guide demonstrating
how to perform web scraping using the BeautifulSoup and requests libraries.

In today’s digital world, data is the key to unlocking valuable insights, and much of this data is available
on the web. But how do you gather large amounts of data from websites efficiently? That’s where Python
web scraping comes in.Web scraping, the process of extracting data from websites, has emerged as a
powerful technique to gather information from the vast expanse of the internet.

In this tutorial, we’ll explore various Python libraries and modules commonly used for web scraping and
delve into why Python 3 is the preferred choice for this task. Along with this you will also explore how to
use powerful tools like BeautifulSoup, Scrapy, and Selenium to scrape any website.

Essential Packages and Tools for Python Web Scraping


The latest version of Python , offers a rich set of tools and libraries specifically designed for web
scraping, making it easier than ever to retrieve data from the web efficiently and effectively.

Table of Content

• Requests Module
• BeautifulSoup Library
• Selenium
• Lxml
• Urllib Module
• PyautoGUI

Requests Module
The requests library is used for making HTTP requests to a specific URL and returns the response.
Python requests provide inbuilt functionalities for managing both the request and response.

pip install requests

Example: Making a Request

Python requests module has several built-in methods to make HTTP requests to specified URI using
GET, POST, PUT, PATCH, or HEAD requests. A HTTP request is meant to either retrieve data from a
specified URI or to push data to a server. It works as a request-response protocol between a client and a
server. Here we will be using the GET request. The GET method is used to retrieve information from the
given server using a given URI. The GET method sends the encoded user information appended to the
page request.

Department of Computer Science and Engineering Page | 54


Student ID:23kb1a3252

EX-CODE:

import requests

# Making a GET request


r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
# check status code for response received
# success code - 200
print(r)

# print content of request


print(r.content)

BeautifulSoup Library
Beautiful Soup provides a few simple methods and Pythonic phrases for guiding, searching, and changing
a parse tree: a toolkit for studying a document and removing what you need. It doesn’t take much code to
document an application.

Beautiful Soup automatically converts incoming records to Unicode and outgoing forms to UTF-8. You
don’t have to think about encodings unless the document doesn’t define an encoding, and Beautiful Soup
can’t catch one. Then you just have to choose the original encoding. Beautiful Soup sits on top of famous
Python parsers like LXML and HTML, allowing you to try different parsing strategies or trade speed for
flexibility.

pip install beautifulsoup4

Example
Importing Libraries: The code imports the requests library for making HTTP requests and the
BeautifulSoup class from the bs4 library for parsing HTML.
Making a GET Request: It sends a GET request to ‘https://www.geeksforgeeks.org/python-programming-
language/’ and stores the response in the variable r.
Checking Status Code: It prints the status code of the response, typically 200 for success.
Parsing the HTML : The HTML content of the response is parsed using BeautifulSoup and stored in the
variable soup.
Printing the Prettified HTML: It prints the prettified version of the parsed HTML content for readability
and analysis.

EX-CODE:
import requests
from bs4 import BeautifulSoup

# Making a GET request


r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
# check status code for response received
# success code - 200

Department of Computer Science and Engineering Page | 55


Student ID:23kb1a3252
print(r)

# Parsing the HTML


soup = BeautifulSoup(r.content, 'html.parser')
print(soup.prettify())

Lxml
The lxml module in Python is a powerful library for processing XML and HTML documents. It provides
a high-performance XML and HTML parsing capabilities along with a simple and Pythonic API. lxml is
widely used in Python web scraping due to its speed, flexibility, and ease of use.

pip install lxml


Example
Here’s a simple example demonstrating how to use the lxml module for Python web scraping:

We import the html module from lxml along with the requests module for sending HTTP requests.
We define the URL of the website we want to scrape.
We send an HTTP GET request to the website using the requests.get() function and retrieve the HTML
content of the page.
We parse the HTML content using the html.fromstring() function from lxml, which returns an HTML
element tree.
We use XPath expressions to extract specific elements from the HTML tree. In this case, we’re extracting
the text content of all the <a> (anchor) elements on the page.
We iterate over the extracted link titles and print them out.

EX-CODE:

from lxml import html


import requests

# Define the URL of the website to scrape


url = 'https://example.com'

# Send an HTTP request to the website and retrieve the HTML content
response = requests.get(url)

# Parse the HTML content using lxml


tree = html.fromstring(response.content)

# Extract specific elements from the HTML tree using XPath


# For example, let's extract the titles of all the links on the page
link_titles = tree.xpath('//a/text()')

# Print the extracted link titles


for title in link_titles:
print(title)

Department of Computer Science and Engineering Page | 56


Student ID:23kb1a3252

Result:
Verified Web Scraping using Python.

Signature of Faculty: Grade:

Department of Computer Science and Engineering Page | 57

You might also like