Web Scraping Using Python - Notes

This document provides an introduction to web scraping using Python. It discusses how web scraping works by sending requests to servers and extracting specific data elements from pages. The steps involved in web scraping include sending HTTP requests, parsing HTML responses, and traversing the parse trees. It also covers installing and importing the BeautifulSoup and Requests libraries for scraping and making requests. As an example, it describes scraping and analyzing COVID-19 case data from the Worldometer website.

Uploaded by

Anand Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

443 views6 pages

Web Scraping Using Python - Notes

Uploaded by

Anand Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Web Scraping using Python

Topics Covered:
● Introduction to Web Scraping
● How Does Web Scraping Work?
● Steps involved in web scraping
● Installing BeautifulSoup
● Installing Requests
● Scraping and Analyzing data from Worldometer website

Introduction to Web Scraping

What is Web Scraping? Why do we use Web Scraping?

Web scraping, web harvesting, or web data extraction is an

automated process of collecting large data(unstructured) from
websites. It is the process of gathering information from the Internet.
Even copying and pasting the lyrics of your favorite song is a form of
web scraping! However, the words “web scraping” usually refer to a
process that involves automation. Some websites don’t like it when
automatic scrapers gather their data, while others don’t mind. The
data collected can be stored in a structured format for further
analysis.

If you’re scraping a page respectfully for educational purposes, then

you’re unlikely to have any problems. Still, it’s a good idea to do some
research on your own and make sure that you’re not violating any
Terms of Service before you start a large-scale project.

1
How Does Web Scraping Work?

When we scrape the web, we write code that sends a request to the
server that’s hosting the page we specified. The server will return the
source code — HTML, mostly — for the page (or pages) we requested.
So far, we’re essentially doing the same thing a web browser does —
sending a server request with a specific URL and asking the server to
return the code for that page.
But unlike a web browser, our web scraping code won’t interpret the
page’s source code and display the page visually. Instead, we’ll write
some custom code that filters through the page’s source code looking
for specific elements we’ve specified, and extracting whatever content
we’ve instructed it to extract.
For example, if we wanted to get all of the data from inside a table that
was displayed on a web page, our code would be written to go
through these steps in sequence:

● Request the content (source code) of a specific URL from the

server
● Download the content that is returned
● Identify the elements of the page that are part of the table we
want
● Extract and (if necessary) reformat those elements into a dataset
we can analyze or use in whatever way we require.

Steps involved in web scraping

● Send an HTTP request to the URL of the webpage you want to

access. The server responds to the request by returning the
HTML content of the webpage.
● Once we have accessed the HTML content, we are left with the
task of parsing the data. Since most of the HTML data is nested,
we cannot extract data simply through string processing. One
needs a parser which can create a nested/tree structure of the
HTML data.

2
● Now, all we need to do is navigate and search the parse tree that
we created, i.e. tree traversal. For this task, we will be using
another third-party python library, Beautiful Soup. It is a Python
library for pulling data out of HTML and XML files.

Installing BeautifulSoup

BeautifulSoup is one of the most prolific Python libraries in existence,

in some part having shaped the web as we know it. BeautifulSoup is a
lightweight, easy-to-learn, and highly effective way to
programmatically isolate information on a single webpage at a time.
It's common to use BeautifulSoupin conjunction with the requests
library, where requests will fetch a page, and BeautifulSoup will extract
the resulting data.

● For installing Pandas Type pip install beautifulsoup4 in the

Command prompt/ terminal.

3
● Or type !pip install beautifulsoup4 or %pip install beautifulsoup4
in a Jupyter notebook cell.

● Then type from bs4 import BeautifulSoup to import pandas.

For more information on BeautifulSoup please refer to the official

Beautiful Soup Documentation.

Installing Requests

The first thing we’ll need to do to scrape a web page is to download

the page. We can download pages using the Python requests library.
The requests library will make a GET request to a web server, which
will download the HTML contents of a given web page for us. There are
several different types of requests we can make using requests, of
which GET is just one.

4
● For installing Pandas Type pip install requests in the Command
prompt/ terminal.

● Or type !pip install requests or %pip install requests in a Jupyter

notebook cell.

● Then type import requests to import pandas.

For more information on Requests please refer to the official Requests

Documentation.

5
Scraping and Analyzing data from Worldometer
website

We have scrapped Covid19. Confirmed cases, Deaths according to

Country and continent from the Worldometer website, this is for
purely educational purposes. The Website-link with the Reference
Notebook and scrapped data set with analysis is given below

● Worldometer website link

● Jupyter Notebook Download Link
● Scrapped Covid19 Dataset Download Link

A Linguagem Da Paz Num Mundo de Conflitos
No ratings yet
A Linguagem Da Paz Num Mundo de Conflitos
181 pages
Python AI Syllabus For Kids
No ratings yet
Python AI Syllabus For Kids
1 page
12 Comp Sci 1 Revision Notes Pythan Advanced Prog
No ratings yet
12 Comp Sci 1 Revision Notes Pythan Advanced Prog
5 pages
SQ L Alchemy
No ratings yet
SQ L Alchemy
1,456 pages
Acceleo User Guide
No ratings yet
Acceleo User Guide
56 pages
Pandas Commands
No ratings yet
Pandas Commands
3 pages
Day 2 - How To Install Python Presentation
No ratings yet
Day 2 - How To Install Python Presentation
14 pages
Natural Language Toolkit NLTK PDF
No ratings yet
Natural Language Toolkit NLTK PDF
23 pages
Python For EveryBody
0% (1)
Python For EveryBody
8 pages
Documenting Software Architecture
No ratings yet
Documenting Software Architecture
29 pages
Mrcet R20 Iv 1 QB
No ratings yet
Mrcet R20 Iv 1 QB
79 pages
Big Data and Data Warehouse
No ratings yet
Big Data and Data Warehouse
19 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Python Specialization4
No ratings yet
Python Specialization4
3 pages
Python Journey From Novice To Expert B01LD8K8WW SAMPLE
0% (1)
Python Journey From Novice To Expert B01LD8K8WW SAMPLE
21 pages
Python Programming Lecture 1
No ratings yet
Python Programming Lecture 1
14 pages
Web Crawling: Christopher Olston and Marc Najork
No ratings yet
Web Crawling: Christopher Olston and Marc Najork
49 pages
Data Science Course Content
No ratings yet
Data Science Course Content
8 pages
Photon Prog Guide
100% (1)
Photon Prog Guide
716 pages
Jupyter
No ratings yet
Jupyter
17 pages
DS Toolbox DataScienceGenius
No ratings yet
DS Toolbox DataScienceGenius
1 page
Data Mining Using Phyton
No ratings yet
Data Mining Using Phyton
50 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
XL Wings
No ratings yet
XL Wings
214 pages
Flipkart Web Scrapping Project
No ratings yet
Flipkart Web Scrapping Project
11 pages
DSL Pandas
No ratings yet
DSL Pandas
87 pages
Figure Style and Scale: Darkgrid Whitegrid Dark White Ticks Darkgrid
No ratings yet
Figure Style and Scale: Darkgrid Whitegrid Dark White Ticks Darkgrid
15 pages
Python Notes For Beginners (Autosaved)
No ratings yet
Python Notes For Beginners (Autosaved)
52 pages
Git 101 For Dummies: Prologue
No ratings yet
Git 101 For Dummies: Prologue
13 pages
Introduction To Microsoft Access
No ratings yet
Introduction To Microsoft Access
3 pages
Introduction Python
No ratings yet
Introduction Python
19 pages
Mahendra Verma - Practical Numerical Computing Using Python - Scientific & Engineering Applications (2021)
No ratings yet
Mahendra Verma - Practical Numerical Computing Using Python - Scientific & Engineering Applications (2021)
553 pages
YUI 2.8.0 Cheat Sheet Packet
No ratings yet
YUI 2.8.0 Cheat Sheet Packet
44 pages
Python Specialization2
No ratings yet
Python Specialization2
3 pages
Digital Dnyan Academy - Best Python Training in Pune - Join Us Today
No ratings yet
Digital Dnyan Academy - Best Python Training in Pune - Join Us Today
10 pages
Programming & Numerical Analysis
No ratings yet
Programming & Numerical Analysis
71 pages
Tools Machine Learning
No ratings yet
Tools Machine Learning
9 pages
Python Programms
No ratings yet
Python Programms
8 pages
List Comprehension in Python
No ratings yet
List Comprehension in Python
8 pages
Data Mining Lab Notes
0% (1)
Data Mining Lab Notes
93 pages
Python - Programming
No ratings yet
Python - Programming
9 pages
Python Roadmap Basic To Advanced Bangla
No ratings yet
Python Roadmap Basic To Advanced Bangla
2 pages
Python Syllabus: Beginner
No ratings yet
Python Syllabus: Beginner
6 pages
NumPy, Pandas, MatplotLib, Seaborn, ScikitLearn (SkLearn)
No ratings yet
NumPy, Pandas, MatplotLib, Seaborn, ScikitLearn (SkLearn)
14 pages
Django - Overview: MVC Pattern
No ratings yet
Django - Overview: MVC Pattern
3 pages
Lecture 10 - Mathematics
No ratings yet
Lecture 10 - Mathematics
38 pages
Build An SEO Analyzer Using Python
No ratings yet
Build An SEO Analyzer Using Python
7 pages
Python For Non-Programmers Final
No ratings yet
Python For Non-Programmers Final
218 pages
Dos Attack (3 PDF
No ratings yet
Dos Attack (3 PDF
21 pages
Isom 3400 - Python For Business Analytics 1. Intro To Python
No ratings yet
Isom 3400 - Python For Business Analytics 1. Intro To Python
46 pages
Naveen Python - For - Data-Science-Report
100% (1)
Naveen Python - For - Data-Science-Report
24 pages
Course File of Ecommerce
100% (2)
Course File of Ecommerce
30 pages
Interface Python With SQL Database: Apurv Gupta
No ratings yet
Interface Python With SQL Database: Apurv Gupta
20 pages
Yeungnam University School of Mechanical Engineering Syllabus For 0993 Tribology
No ratings yet
Yeungnam University School of Mechanical Engineering Syllabus For 0993 Tribology
42 pages
Data Preprocessing Python 1
No ratings yet
Data Preprocessing Python 1
3 pages
WebScraping Lessons 1
100% (1)
WebScraping Lessons 1
3 pages
Chapter2 PDF
No ratings yet
Chapter2 PDF
24 pages
Web Scraping Using Python: A Step by Step Guide: September 2019
0% (1)
Web Scraping Using Python: A Step by Step Guide: September 2019
7 pages
Web Application Development Using Python: Techprime
100% (2)
Web Application Development Using Python: Techprime
2 pages
20 - BeautifulSoup Library For Web Scraping
No ratings yet
20 - BeautifulSoup Library For Web Scraping
12 pages
Web Scraping Using Python
No ratings yet
Web Scraping Using Python
18 pages
Offer Letter - Anand S
No ratings yet
Offer Letter - Anand S
3 pages
Jaivik Kheti
No ratings yet
Jaivik Kheti
9 pages
Web Scraping Using Python - Assignment Solutions
No ratings yet
Web Scraping Using Python - Assignment Solutions
2 pages
Assignment - Exploratory Data Analysis Using Python
No ratings yet
Assignment - Exploratory Data Analysis Using Python
1 page
Assignment MET1233
No ratings yet
Assignment MET1233
12 pages
Intel Entry Storage System SS4000 e PDF
No ratings yet
Intel Entry Storage System SS4000 e PDF
246 pages
AutoCAD Instruction Manual
100% (2)
AutoCAD Instruction Manual
114 pages
Maximum Supported Hopping Rate Measurements Using The Universal Software Radio Peripheral Software Defined Radio
No ratings yet
Maximum Supported Hopping Rate Measurements Using The Universal Software Radio Peripheral Software Defined Radio
7 pages
Thermal Analysis Using Imported CAD Geometry: Exercise 5
No ratings yet
Thermal Analysis Using Imported CAD Geometry: Exercise 5
16 pages
Eve Lam CV
No ratings yet
Eve Lam CV
2 pages
Ass Dbi
No ratings yet
Ass Dbi
11 pages
What Is Multimedia?: Multimedia Means That Computer Information Can Be Represented Through Audio, Video, and Animation in
No ratings yet
What Is Multimedia?: Multimedia Means That Computer Information Can Be Represented Through Audio, Video, and Animation in
2 pages
Visioe Rror MSG
No ratings yet
Visioe Rror MSG
24 pages
Computer History Timeline PPTX 1
100% (1)
Computer History Timeline PPTX 1
11 pages
PNF Visunet HMI Monitor PDF
No ratings yet
PNF Visunet HMI Monitor PDF
12 pages
SDR and NFV Extensions in The Ns-3 LTE Module For 5G Rapid Prototyping
No ratings yet
SDR and NFV Extensions in The Ns-3 LTE Module For 5G Rapid Prototyping
6 pages
Class 8 Qbasic Notes
100% (2)
Class 8 Qbasic Notes
5 pages
APCS 300/CTT 502 - Introduction To Software Engineering: Project Assignment 2 (PA2)
No ratings yet
APCS 300/CTT 502 - Introduction To Software Engineering: Project Assignment 2 (PA2)
2 pages
Biodata Etrio Widodo
No ratings yet
Biodata Etrio Widodo
3 pages
HTML Tags - Sample Files
No ratings yet
HTML Tags - Sample Files
9 pages
Integradora de Administracion Logistica - Subscription-8!11!2021
No ratings yet
Integradora de Administracion Logistica - Subscription-8!11!2021
48 pages
How To Setup Wireless of Edimax Camera
No ratings yet
How To Setup Wireless of Edimax Camera
5 pages
200Mhz Bandwidth Digital Storage Scope For PC: Part No. 01ossds200
No ratings yet
200Mhz Bandwidth Digital Storage Scope For PC: Part No. 01ossds200
3 pages
Payment Proposal Workflow
No ratings yet
Payment Proposal Workflow
4 pages
White Paper Openmatics, ZF Friedrichshafen AG - A Platform For All Telematics Applications - English
No ratings yet
White Paper Openmatics, ZF Friedrichshafen AG - A Platform For All Telematics Applications - English
5 pages
EMUS EOL 1901 Product End of Life Notice Rev2
No ratings yet
EMUS EOL 1901 Product End of Life Notice Rev2
2 pages
Creating A Thread by Extending The Thread Class: Package
No ratings yet
Creating A Thread by Extending The Thread Class: Package
6 pages
Lesson Plan: R.V. College of Engineering, Bangalore
No ratings yet
Lesson Plan: R.V. College of Engineering, Bangalore
14 pages
How To Execute Field Extension of ACDOCU - 1909 - V6
No ratings yet
How To Execute Field Extension of ACDOCU - 1909 - V6
21 pages
TQC Installation Maintenance Manual - Rev 7 - 09
100% (2)
TQC Installation Maintenance Manual - Rev 7 - 09
25 pages
Amazon Complaint
No ratings yet
Amazon Complaint
103 pages
Get TRDoc
No ratings yet
Get TRDoc
98 pages
PSC 1000
67% (3)
PSC 1000
2 pages