0% found this document useful (0 votes)

50 views4 pages

2python Web Scraping Introduction

Uploaded by

David Osei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views4 pages

2python Web Scraping Introduction

Uploaded by

David Osei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

PYTHON WEB SCRAPING INTRODUCTION

https://www.tutorialspoint.com/python_web_scraping/python_web_scraping_introduction.htm Copyright © tutorialspoint.com

What is Web Scraping?

The dictionary meaning of word ‘Scrapping’ implies getting something from the web. Here two questions arise:
What we can get from the web and How to get that.

The answer to the first question is ‘data’. Data is indispensable for any programmer and the basic requirement of
every programming project is the large amount of useful data.

The answer to the second question is a bit tricky, because there are lots of ways to get data. In general, we may get
data from a database or data file and other sources. But what if we need large amount of data that is available
online? One way to get such kind of data is to manually search clickingawayinawebbrowser and save
copy − pastingintoaspreadsheetorf ile the required data. This method is quite tedious and time consuming.

Another way to get such data is using web scraping.

Web scraping, also called web data mining or web harvesting, is the process of constructing an agent which
can extract, parse, download and organize useful information from the web automatically. In other words, we can
say that instead of manually saving the data from websites, the web scraping software will automatically load and
extract data from multiple websites as per our requirement.

Origin of Web Scraping

The origin of web scraping is screen scrapping, which was used to integrate nonweb based applications or native
windows applications. Originally screen scraping was used prior to the wide use of World Wide Web W W W , but it
could not scale up WWW expanded. This made it necessary to automate the approach of screen scraping and the
technique called ‘Web Scraping’ came into existence.

Web Crawling v/s Web Scraping

The terms Web Crawling and Scraping are often used interchangeably as the basic concept of them is to extract
data. However, they are different from each other. We can understand the basic difference from their definitions.

Web crawling is basically used to index the information on the page using bots aka crawlers. It is also called
indexing. On the hand, web scraping is an automated way of extracting the information using bots aka scrapers.
It is also called data extraction.

To understand the difference between these two terms, let us look into the comparison table given hereunder −

Web Crawling Web Scraping

Refers to downloading and storing the contents of a Refers to extracting individual data elements from the
large number of websites. website by using a sitespecific structure.
Mostly done on large scale. Can be implemented at any scale.

Yields generic information. Yields specific information.

The information extracted using web scraping can be

Used by major search engines like Google, Bing, used to replicate in some other website or can be used
Yahoo. Googlebot is an example of a web crawler. to perform data analysis. For example the data
elements can be names, address, price etc.

Uses of Web Scraping

The uses and reasons for using web scraping are as endless as the uses of the World Wide Web. Web scrapers can
do anything like ordering online food, scanning online shopping website for you and buying ticket of a match the
moment they are available etc. just like a human can do. Some of the important uses of web scraping are discussed
here −

Ecommerce Websites − Web scrapers can collect the data specially related to the price of a specific
product from various ecommerce websites for their comparison.

Content Aggregators − Web scraping is used widely by content aggregators like news aggregators and
job aggregators for providing updated data to their users.

Marketing and Sales Campaigns − Web scrapers can be used to get the data like emails, phone number
etc. for sales and marketing campaigns.

Search Engine Optimization S EO − Web scraping is widely used by SEO tools like SEMRush, Majestic
etc. to tell business how they rank for search keywords that matter to them.

Data for Machine Learning Projects − Retrieval of data for machine learning projects depends upon
web scraping.

Data for Research − Researchers can collect useful data for the purpose of their research work by saving their
time by this automated process.

Components of a Web Scraper

A web scraper consists of the following components −

Web Crawler Module

A very necessary component of web scraper, web crawler module, is used to navigate the target website by making
HTTP or HTTPS request to the URLs. The crawler downloads the unstructured data H T M Lcontents and passes
it to extractor, the next module.

Extractor
The extractor processes the fetched HTML content and extracts the data into semistructured format. This is also
called as a parser module and uses different parsing techniques like Regular expression, HTML Parsing, DOM
parsing or Artificial Intelligence for its functioning.

Data Transformation and Cleaning Module

The data extracted above is not suitable for ready use. It must pass through some cleaning module so that we can
use it. The methods like String manipulation or regular expression can be used for this purpose. Note that
extraction and transformation can be performed in a single step also.

Storage Module

After extracting the data, we need to store it as per our requirement. The storage module will output the data in a
standard format that can be stored in a database or JSON or CSV format.

Working of a Web Scraper

Web scraper may be defined as a software or script used to download the contents of multiple web pages and
extracting data from it.

We can understand the working of a web scraper in simple steps as shown in the diagram given above.

Step 1: Downloading Contents from Web Pages

In this step, a web scraper will download the requested contents from multiple web pages.

Step 2: Extracting Data

The data on websites is HTML and mostly unstructured. Hence, in this step, web scraper will parse and extract
structured data from the downloaded contents.

Step 3: Storing the Data

Here, a web scraper will store and save the extracted data in any of the format like CSV, JSON or in database.
Step 4: Analyzing the Data

After all these steps are successfully done, the web scraper will analyze the data thus obtained.

0580 Practice Test 2 2025 (Paper 2) MS
67% (12)
0580 Practice Test 2 2025 (Paper 2) MS
10 pages
Python Web Scraping Tutorial
92% (12)
Python Web Scraping Tutorial
65 pages
Web Scraping Ganesh
0% (1)
Web Scraping Ganesh
20 pages
1.8 Data Scrapping PDF
No ratings yet
1.8 Data Scrapping PDF
42 pages
Introduction To Web Scraping
100% (1)
Introduction To Web Scraping
3 pages
Web Scraping
No ratings yet
Web Scraping
4 pages
Module 4
No ratings yet
Module 4
14 pages
Data Analysis by Web Scraping Using Python
No ratings yet
Data Analysis by Web Scraping Using Python
6 pages
Web Scraping with Python Step by Step: A Practical Guide with Examples
From Everand
Web Scraping with Python Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Seminar Completed
No ratings yet
Seminar Completed
22 pages
Web Crawling - Python
No ratings yet
Web Crawling - Python
34 pages
Web Scraping Course Notes
No ratings yet
Web Scraping Course Notes
89 pages
Web Scraping With Python Tutorials From A To Z
100% (2)
Web Scraping With Python Tutorials From A To Z
35 pages
Web Scrapping
No ratings yet
Web Scrapping
13 pages
Amazon WEB Scrapin G: Using Python
No ratings yet
Amazon WEB Scrapin G: Using Python
9 pages
Web Scraping Using Python
No ratings yet
Web Scraping Using Python
18 pages
Com 059
No ratings yet
Com 059
6 pages
Implementation of Web Application For Disease Prediction Using AI
No ratings yet
Implementation of Web Application For Disease Prediction Using AI
5 pages
Web Scraping Presentation With Images
No ratings yet
Web Scraping Presentation With Images
4 pages
Web Data Scraping
No ratings yet
Web Data Scraping
5 pages
Text Processing For NLP Web Scrapping
No ratings yet
Text Processing For NLP Web Scrapping
18 pages
4F IntroToWebScraping
No ratings yet
4F IntroToWebScraping
6 pages
Web Scraping 2
No ratings yet
Web Scraping 2
14 pages
Intro To Web Scraping
No ratings yet
Intro To Web Scraping
13 pages
Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application
No ratings yet
Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application
25 pages
Data Collection
No ratings yet
Data Collection
10 pages
Scraping Book
No ratings yet
Scraping Book
50 pages
Scraping Book Python PDF
No ratings yet
Scraping Book Python PDF
50 pages
Web Crawling State of ArtTechniques ApproachesandApplication
No ratings yet
Web Crawling State of ArtTechniques ApproachesandApplication
26 pages
Reverse Image Search: Unlocking the Secrets of Visual Recognition
From Everand
Reverse Image Search: Unlocking the Secrets of Visual Recognition
Fouad Sabry
No ratings yet
Web Scrapping Final
No ratings yet
Web Scrapping Final
7 pages
Web Scraping With Python - Sample Chapter
100% (3)
Web Scraping With Python - Sample Chapter
26 pages
Abstract: YSPM'S YTC, Faculty of MCA, Satara. 1
No ratings yet
Abstract: YSPM'S YTC, Faculty of MCA, Satara. 1
15 pages
Summary Paper 13 14 15
No ratings yet
Summary Paper 13 14 15
2 pages
Web Scraping With Python_ a Complete Step-By-Step Guide + Code _ by Anthony Heath _ Geek Culture _ Medium
No ratings yet
Web Scraping With Python_ a Complete Step-By-Step Guide + Code _ by Anthony Heath _ Geek Culture _ Medium
42 pages
Web Scraping With Python
No ratings yet
Web Scraping With Python
16 pages
INDEX
No ratings yet
INDEX
3 pages
Rohan Report
No ratings yet
Rohan Report
25 pages
Mini Project
No ratings yet
Mini Project
13 pages
Sing Rodia 2019
No ratings yet
Sing Rodia 2019
6 pages
Web Scraping, Web Harvesting, or Web Data Extraction Is
No ratings yet
Web Scraping, Web Harvesting, or Web Data Extraction Is
1 page
20 - 3 - A Study
No ratings yet
20 - 3 - A Study
5 pages
Semin
No ratings yet
Semin
8 pages
Web Scraping
No ratings yet
Web Scraping
5 pages
Scraperapi Web Scrapping The Basics Explained
No ratings yet
Scraperapi Web Scrapping The Basics Explained
15 pages
EJMCM Volume7 Issue3 Pages433-442
No ratings yet
EJMCM Volume7 Issue3 Pages433-442
11 pages
Abhishek
No ratings yet
Abhishek
10 pages
Web Scraping
No ratings yet
Web Scraping
28 pages
Performance Tools
From Everand
Performance Tools
Ahmed Bouchefra
No ratings yet
DAP MOD 4-5
No ratings yet
DAP MOD 4-5
19 pages
Unit 11 Application Development Using Python
No ratings yet
Unit 11 Application Development Using Python
19 pages
Beginners
No ratings yet
Beginners
16 pages
Data Aggregation by Web Scraping Using Python
No ratings yet
Data Aggregation by Web Scraping Using Python
48 pages
Web Scraping Using Python - Notes
No ratings yet
Web Scraping Using Python - Notes
6 pages
Final Publish Paper
No ratings yet
Final Publish Paper
4 pages
218R1A6747
No ratings yet
218R1A6747
10 pages
UE20CS203-Unit1-Class6-Scraping The Web, Reading Files (.CSV)
No ratings yet
UE20CS203-Unit1-Class6-Scraping The Web, Reading Files (.CSV)
29 pages
Automated_Web_Scraping_for_Telecom_Corpus_Application
No ratings yet
Automated_Web_Scraping_for_Telecom_Corpus_Application
5 pages
Web Scrapping: Dept - of CS&E, BIET, Davangere Page - 1
No ratings yet
Web Scrapping: Dept - of CS&E, BIET, Davangere Page - 1
8 pages
@7724353 PDF
No ratings yet
@7724353 PDF
5 pages
A Dive Into Web Scraper World
100% (1)
A Dive Into Web Scraper World
5 pages
4python Heat Maps
No ratings yet
4python Heat Maps
1 page
Python Web Scraping Data Extraction
No ratings yet
Python Web Scraping Data Extraction
4 pages
8python Web Scraping Dealing With Text
No ratings yet
8python Web Scraping Dealing With Text
7 pages
3python Web Scraping Getting Started With Python
No ratings yet
3python Web Scraping Getting Started With Python
4 pages
11python Web Scraping Testing With Scrapers
No ratings yet
11python Web Scraping Testing With Scrapers
5 pages
10python Web Scraping Form Based Websites
No ratings yet
10python Web Scraping Form Based Websites
3 pages
Reference Consent Form
No ratings yet
Reference Consent Form
1 page
Learning Theories 2016-1 (Notes)
No ratings yet
Learning Theories 2016-1 (Notes)
69 pages
Hierarchy of Need by Abraham Maslow
No ratings yet
Hierarchy of Need by Abraham Maslow
8 pages
The Dance of Shiva
No ratings yet
The Dance of Shiva
20 pages
Repeat and Learn - 01 (Yoku Nemureta)
No ratings yet
Repeat and Learn - 01 (Yoku Nemureta)
1 page
Cpfnilepet 140929165258 Phpapp01
No ratings yet
Cpfnilepet 140929165258 Phpapp01
36 pages
Test Bank For Testbank Problems and Solutions in Organic Chemistry For JEE Main and Advanced 3rd Edition Download
100% (2)
Test Bank For Testbank Problems and Solutions in Organic Chemistry For JEE Main and Advanced 3rd Edition Download
404 pages
Amc Aime
No ratings yet
Amc Aime
2 pages
Greek Heroes: Achilles
No ratings yet
Greek Heroes: Achilles
40 pages
STP Phase 5 Paper 2
No ratings yet
STP Phase 5 Paper 2
29 pages
Ronald A. Knox - On Englishing The Bible-Burns, Oates (1949)
No ratings yet
Ronald A. Knox - On Englishing The Bible-Burns, Oates (1949)
118 pages
Android RDS 1.1.1 SOP & Troubleshooting Guide
No ratings yet
Android RDS 1.1.1 SOP & Troubleshooting Guide
7 pages
Code of Ethics For Philippine Librarians, Code of Ethics of Indonesian Librarians and Code of Ethics For Malaysian Librarians: A Comparative Study
No ratings yet
Code of Ethics For Philippine Librarians, Code of Ethics of Indonesian Librarians and Code of Ethics For Malaysian Librarians: A Comparative Study
18 pages
All About Sharks
No ratings yet
All About Sharks
11 pages
Health Assessment Lecture Notes
No ratings yet
Health Assessment Lecture Notes
97 pages
Chapter 21 I Variations ENHANCE
No ratings yet
Chapter 21 I Variations ENHANCE
21 pages
Aisi 1045 Medium Carbon Steel: Written by Azom
No ratings yet
Aisi 1045 Medium Carbon Steel: Written by Azom
4 pages
Bgy Pitogo, Makati City
No ratings yet
Bgy Pitogo, Makati City
17 pages
[Ebooks PDF] download (Ebook) Relativity: An Introduction to Special and General Relativity by Hans Stephani ISBN 9780511648090, 9780521010696, 9780521811859, 051164809X, 0521010691, 0521811856 full chapters
100% (5)
[Ebooks PDF] download (Ebook) Relativity: An Introduction to Special and General Relativity by Hans Stephani ISBN 9780511648090, 9780521010696, 9780521811859, 051164809X, 0521010691, 0521811856 full chapters
76 pages
Teacher'S Individual Classroom Program: Pula High School
No ratings yet
Teacher'S Individual Classroom Program: Pula High School
1 page
(Infosys) : A Project Reprot ON Enterprise Analysis & Desk Research For
100% (1)
(Infosys) : A Project Reprot ON Enterprise Analysis & Desk Research For
4 pages
Hsieh 2015
No ratings yet
Hsieh 2015
6 pages
Resume 2 RTF 0 1
No ratings yet
Resume 2 RTF 0 1
3 pages
Psikologi Komunikasi Remaja Terhadap Konsep Diri Di Kalangan Komunitas Cosplayer Medan
No ratings yet
Psikologi Komunikasi Remaja Terhadap Konsep Diri Di Kalangan Komunitas Cosplayer Medan
15 pages
The Language Skills 1 TEFL AIOU
100% (1)
The Language Skills 1 TEFL AIOU
24 pages
Week 5 Unseen Question
No ratings yet
Week 5 Unseen Question
4 pages
Laplace Transform Tables
100% (1)
Laplace Transform Tables
2 pages
VecForce 1 Ok
No ratings yet
VecForce 1 Ok
9 pages
Fosroc Supercast-PVC Waterstop
No ratings yet
Fosroc Supercast-PVC Waterstop
8 pages
John Tribe - The Economics of Recreation, Leisure and Tourism, Fourth Edition (2011, Butterworth-Heinemann) - 36-58
No ratings yet
John Tribe - The Economics of Recreation, Leisure and Tourism, Fourth Edition (2011, Butterworth-Heinemann) - 36-58
23 pages
12 PPG - Week 1
No ratings yet
12 PPG - Week 1
6 pages