0% found this document useful (0 votes)

35 views5 pages

3252 Ids 10

The document provides a comprehensive guide on web scraping using Python, focusing on libraries such as BeautifulSoup, requests, and Selenium. It explains the process of extracting data from websites and includes examples of making HTTP requests and parsing HTML content. Additionally, it highlights the importance of Python for web scraping and introduces essential packages and tools for effective data retrieval.

Uploaded by

nbkr115

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views5 pages

3252 Ids 10

Uploaded by

nbkr115

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Student ID:23kb1a3252

EXPERIMENT -X

Department of Computer Science and Engineering Page | 53

Student ID:23kb1a3252

Demonstrate Web Scraping Using Python

Description:

Web scraping is the process of extracting data from websites. In Python, web scraping can be performed
using libraries such as BeautifulSoup, requests, and Selenium. Here's a step-by-step guide demonstrating
how to perform web scraping using the BeautifulSoup and requests libraries.

In today’s digital world, data is the key to unlocking valuable insights, and much of this data is available
on the web. But how do you gather large amounts of data from websites efficiently? That’s where Python
web scraping comes in.Web scraping, the process of extracting data from websites, has emerged as a
powerful technique to gather information from the vast expanse of the internet.

In this tutorial, we’ll explore various Python libraries and modules commonly used for web scraping and
delve into why Python 3 is the preferred choice for this task. Along with this you will also explore how to
use powerful tools like BeautifulSoup, Scrapy, and Selenium to scrape any website.

Essential Packages and Tools for Python Web Scraping

The latest version of Python , offers a rich set of tools and libraries specifically designed for web
scraping, making it easier than ever to retrieve data from the web efficiently and effectively.

Table of Content

• Requests Module
• BeautifulSoup Library
• Selenium
• Lxml
• Urllib Module
• PyautoGUI

Requests Module
The requests library is used for making HTTP requests to a specific URL and returns the response.
Python requests provide inbuilt functionalities for managing both the request and response.

pip install requests

Example: Making a Request

Python requests module has several built-in methods to make HTTP requests to specified URI using
GET, POST, PUT, PATCH, or HEAD requests. A HTTP request is meant to either retrieve data from a
specified URI or to push data to a server. It works as a request-response protocol between a client and a
server. Here we will be using the GET request. The GET method is used to retrieve information from the
given server using a given URI. The GET method sends the encoded user information appended to the
page request.

Department of Computer Science and Engineering Page | 54

Student ID:23kb1a3252

EX-CODE:

import requests

# Making a GET request

r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
# check status code for response received
# success code - 200
print(r)

# print content of request

print(r.content)

BeautifulSoup Library
Beautiful Soup provides a few simple methods and Pythonic phrases for guiding, searching, and changing
a parse tree: a toolkit for studying a document and removing what you need. It doesn’t take much code to
document an application.

Beautiful Soup automatically converts incoming records to Unicode and outgoing forms to UTF-8. You
don’t have to think about encodings unless the document doesn’t define an encoding, and Beautiful Soup
can’t catch one. Then you just have to choose the original encoding. Beautiful Soup sits on top of famous
Python parsers like LXML and HTML, allowing you to try different parsing strategies or trade speed for
flexibility.

pip install beautifulsoup4

Example
Importing Libraries: The code imports the requests library for making HTTP requests and the
BeautifulSoup class from the bs4 library for parsing HTML.
Making a GET Request: It sends a GET request to ‘https://www.geeksforgeeks.org/python-programming-
language/’ and stores the response in the variable r.
Checking Status Code: It prints the status code of the response, typically 200 for success.
Parsing the HTML : The HTML content of the response is parsed using BeautifulSoup and stored in the
variable soup.
Printing the Prettified HTML: It prints the prettified version of the parsed HTML content for readability
and analysis.

EX-CODE:
import requests
from bs4 import BeautifulSoup

# Making a GET request

r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
# check status code for response received
# success code - 200

Department of Computer Science and Engineering Page | 55

Student ID:23kb1a3252
print(r)

# Parsing the HTML

soup = BeautifulSoup(r.content, 'html.parser')
print(soup.prettify())

Lxml
The lxml module in Python is a powerful library for processing XML and HTML documents. It provides
a high-performance XML and HTML parsing capabilities along with a simple and Pythonic API. lxml is
widely used in Python web scraping due to its speed, flexibility, and ease of use.

pip install lxml

Example
Here’s a simple example demonstrating how to use the lxml module for Python web scraping:

We import the html module from lxml along with the requests module for sending HTTP requests.
We define the URL of the website we want to scrape.
We send an HTTP GET request to the website using the requests.get() function and retrieve the HTML
content of the page.
We parse the HTML content using the html.fromstring() function from lxml, which returns an HTML
element tree.
We use XPath expressions to extract specific elements from the HTML tree. In this case, we’re extracting
the text content of all the <a> (anchor) elements on the page.
We iterate over the extracted link titles and print them out.

EX-CODE:

from lxml import html

import requests

# Define the URL of the website to scrape

url = 'https://example.com'

# Send an HTTP request to the website and retrieve the HTML content
response = requests.get(url)

# Parse the HTML content using lxml

tree = html.fromstring(response.content)

# Extract specific elements from the HTML tree using XPath

# For example, let's extract the titles of all the links on the page
link_titles = tree.xpath('//a/text()')

# Print the extracted link titles

for title in link_titles:
print(title)

Department of Computer Science and Engineering Page | 56

Student ID:23kb1a3252

Result:
Verified Web Scraping using Python.

Signature of Faculty: Grade:

Department of Computer Science and Engineering Page | 57

Python Toolbox 100 Scripts For Developers Enhance Your Development Skills With Ready-to-Use Python Scripts (Sari, Serhan) (Z-Library)
No ratings yet
Python Toolbox 100 Scripts For Developers Enhance Your Development Skills With Ready-to-Use Python Scripts (Sari, Serhan) (Z-Library)
193 pages
Requests Readthedocs Io en Latest
No ratings yet
Requests Readthedocs Io en Latest
123 pages
DAP Module4
No ratings yet
DAP Module4
109 pages
Python Module-4
No ratings yet
Python Module-4
109 pages
Sari Serhan Python Toolbox 100 Scripts For Developers 2023
No ratings yet
Sari Serhan Python Toolbox 100 Scripts For Developers 2023
193 pages
Implementing Web Scraping in Python With Beautifulsoup
No ratings yet
Implementing Web Scraping in Python With Beautifulsoup
6 pages
Requests
No ratings yet
Requests
119 pages
Requests Readthedocs Io en Latest
No ratings yet
Requests Readthedocs Io en Latest
121 pages
1710988761593
100% (2)
1710988761593
169 pages
Unit 4
No ratings yet
Unit 4
36 pages
Docs Scrapy Org en Latest
No ratings yet
Docs Scrapy Org en Latest
354 pages
Lab Manual
No ratings yet
Lab Manual
21 pages
Chapter-4 Update
No ratings yet
Chapter-4 Update
16 pages
Requests Documentation: Release 2.27.1
No ratings yet
Requests Documentation: Release 2.27.1
117 pages
Data - Collection Python
No ratings yet
Data - Collection Python
40 pages
DWV Unit Ii
No ratings yet
DWV Unit Ii
37 pages
Cric Score App
No ratings yet
Cric Score App
16 pages
Requests Documentation: Release 2.25.1
No ratings yet
Requests Documentation: Release 2.25.1
111 pages
First Web Scraper
No ratings yet
First Web Scraper
34 pages
Arm Module 4
No ratings yet
Arm Module 4
27 pages
Web Scraping With Python Tutorials From A To Z
100% (2)
Web Scraping With Python Tutorials From A To Z
35 pages
Docs Python Requests Org en Latest
No ratings yet
Docs Python Requests Org en Latest
117 pages
Module - 5: 5.1 Networked Programs
No ratings yet
Module - 5: 5.1 Networked Programs
25 pages
DAP 4 Module
No ratings yet
DAP 4 Module
45 pages
Python Units 4 Notes
No ratings yet
Python Units 4 Notes
11 pages
CTP M5 CH1, CH2
No ratings yet
CTP M5 CH1, CH2
18 pages
Web Scrapping
100% (1)
Web Scrapping
20 pages
Python Unit-4
No ratings yet
Python Unit-4
10 pages
Web Scraping Report
No ratings yet
Web Scraping Report
14 pages
03 Web Scraping
No ratings yet
03 Web Scraping
41 pages
Chapter 11. Web Scraping
100% (1)
Chapter 11. Web Scraping
57 pages
Module 5-Networked Programs
No ratings yet
Module 5-Networked Programs
84 pages
Slidesgo Getting Cozy With Pythons Httplib and Urllib Your New Best Buddies For Web Requests 20241216163146egds
No ratings yet
Slidesgo Getting Cozy With Pythons Httplib and Urllib Your New Best Buddies For Web Requests 20241216163146egds
10 pages
Practical Introduction To Web Scraping in Python
100% (1)
Practical Introduction To Web Scraping in Python
14 pages
Retrieving Data From The Web
No ratings yet
Retrieving Data From The Web
9 pages
DWV Labs 2025 1
No ratings yet
DWV Labs 2025 1
17 pages
Web Crawling - Python
No ratings yet
Web Crawling - Python
34 pages
Json XML, HTTPlib, Urllib, Smtplib Explain All Li...
No ratings yet
Json XML, HTTPlib, Urllib, Smtplib Explain All Li...
4 pages
On Python Project VI Semester: Academic Year: 2018-2019
No ratings yet
On Python Project VI Semester: Academic Year: 2018-2019
7 pages
Web Scraping With Python
No ratings yet
Web Scraping With Python
21 pages
Web Scraping
No ratings yet
Web Scraping
35 pages
Full Web Scraping With Python Collecting Data From The Modern Web 1st Edition Ryan Mitchell Ebook All Chapters
No ratings yet
Full Web Scraping With Python Collecting Data From The Modern Web 1st Edition Ryan Mitchell Ebook All Chapters
67 pages
PDF Document 2
No ratings yet
PDF Document 2
24 pages
Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius
No ratings yet
Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius
14 pages
Web Technologies QA
No ratings yet
Web Technologies QA
5 pages
The Python Standard Library
No ratings yet
The Python Standard Library
8 pages
Ibm Python Module 5 Apis Data Collection
No ratings yet
Ibm Python Module 5 Apis Data Collection
3 pages
DeepSeek - Python Tutorial
No ratings yet
DeepSeek - Python Tutorial
8 pages
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
52 pages
Notes For Web Scraping - BeautifulSoup-3903
No ratings yet
Notes For Web Scraping - BeautifulSoup-3903
6 pages
Django Ppts
No ratings yet
Django Ppts
243 pages
Python Packages and Uses
No ratings yet
Python Packages and Uses
3 pages
Project Report On Online Streaming of Videos
40% (5)
Project Report On Online Streaming of Videos
37 pages
Introduction To Web Crawling Chapter - 13
No ratings yet
Introduction To Web Crawling Chapter - 13
3 pages
Api and Data Structure
No ratings yet
Api and Data Structure
3 pages
Web Scraping and Data Collection CheatSheet 1731972399
No ratings yet
Web Scraping and Data Collection CheatSheet 1731972399
10 pages
The Ultimate Web Scraping With Python Bootcamp 2023 - Coderprog
No ratings yet
The Ultimate Web Scraping With Python Bootcamp 2023 - Coderprog
3 pages
Web Scrapping: Dept - of CS&E, BIET, Davangere Page - 1
No ratings yet
Web Scrapping: Dept - of CS&E, BIET, Davangere Page - 1
8 pages
Chapter 2 - Communication and Internet Technologies
No ratings yet
Chapter 2 - Communication and Internet Technologies
37 pages
HTML Tags: Tag Description
No ratings yet
HTML Tags: Tag Description
2 pages
Itr Skit
No ratings yet
Itr Skit
23 pages
Python Scrapy
No ratings yet
Python Scrapy
244 pages
Web Application - Introduction
No ratings yet
Web Application - Introduction
28 pages
IT Skills Mahesh Sir
No ratings yet
IT Skills Mahesh Sir
58 pages
Moh Web Report
No ratings yet
Moh Web Report
36 pages
Fycs Syllabus KC Hsncu Ay 2020 2021 Final
No ratings yet
Fycs Syllabus KC Hsncu Ay 2020 2021 Final
55 pages
Maze Game: Submitted by PRISHA MEERA M.S (211501069) SAHANA.S (211501084) SOWMYA.S (211501101)
No ratings yet
Maze Game: Submitted by PRISHA MEERA M.S (211501069) SAHANA.S (211501084) SOWMYA.S (211501101)
36 pages
HTML and Web Development Notes
No ratings yet
HTML and Web Development Notes
3 pages
HTML Introduction: Don Bosco Secondary and Preparatory School
No ratings yet
HTML Introduction: Don Bosco Secondary and Preparatory School
77 pages
UNIT-5 Sem Answers
No ratings yet
UNIT-5 Sem Answers
13 pages
CSS Diploma Wale Helpdesk
No ratings yet
CSS Diploma Wale Helpdesk
24 pages
Komal Jangid
No ratings yet
Komal Jangid
22 pages
Key Findings: Strategic Assessment 1999 Priorities For A Turbulent World
No ratings yet
Key Findings: Strategic Assessment 1999 Priorities For A Turbulent World
40 pages
Multimedia & Animation Part-A Lab Manual
No ratings yet
Multimedia & Animation Part-A Lab Manual
11 pages
Web Technology Unit3
No ratings yet
Web Technology Unit3
19 pages
HTML Elements and Tags: Accessibility
No ratings yet
HTML Elements and Tags: Accessibility
10 pages
SEO Sementic Coding
No ratings yet
SEO Sementic Coding
59 pages
HTML Formatting Tags
No ratings yet
HTML Formatting Tags
16 pages
Competitive Exam Preparation Application: in Partial Fulfillment of Requirement For The Award of Degree
No ratings yet
Competitive Exam Preparation Application: in Partial Fulfillment of Requirement For The Award of Degree
41 pages
HTML Paragraphs
0% (1)
HTML Paragraphs
4 pages
Lab1-HTML1 v2 - Learn HTML Lab1-HTML1 v2 - Learn HTML
No ratings yet
Lab1-HTML1 v2 - Learn HTML Lab1-HTML1 v2 - Learn HTML
4 pages
Grade 9 Dbe Mini Pat Coro Term 4 (Mod)
No ratings yet
Grade 9 Dbe Mini Pat Coro Term 4 (Mod)
7 pages
HTML MCQ by Verma
No ratings yet
HTML MCQ by Verma
2 pages
Class-11 Ex-4. Ict MCQ (Pag - 3.2)
No ratings yet
Class-11 Ex-4. Ict MCQ (Pag - 3.2)
5 pages
Adobe ColdFusion 9 - Application-Based User Security Example
No ratings yet
Adobe ColdFusion 9 - Application-Based User Security Example
6 pages
Intern Assignment
No ratings yet
Intern Assignment
2 pages
Kruger National Park Task
No ratings yet
Kruger National Park Task
1 page
Hands-On Web Scraping with Python: Perform advanced scraping operations using various Python libraries and tools such as Selenium, Regex, and others
From Everand
Hands-On Web Scraping with Python: Perform advanced scraping operations using various Python libraries and tools such as Selenium, Regex, and others
Anish Chapagain
No ratings yet
Web Scraping with Python Step by Step: A Practical Guide with Examples
From Everand
Web Scraping with Python Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Web Scraping for SEO with Python
From Everand
Web Scraping for SEO with Python
Enrique Vicente
No ratings yet