0% found this document useful (0 votes)

26 views43 pages

2022 Scraping Without Programming Tutorial

The document provides an overview of web scraping techniques that can be used without programming. It explains how to scrape data from websites using Google Sheets formulas like ImportHTML and ImportXML, and describes point-and-click scraping apps like OutWit Hub and ParseHub. The document also discusses HTML elements and XPATHs that can be used to target specific data fields for extraction.

Uploaded by

Faisal Kareem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views43 pages

2022 Scraping Without Programming Tutorial

Uploaded by

Faisal Kareem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

tinyurl.

com/SunneScrapingTutorial

1
Scraping Without
Programming
© Samantha Sunne
What is scraping?
What is scraping?

It means to grab data

through code, elbow
grease, or whatever other
method you have on hand.

4
How do journalists usually get data?

From Humans From Computers

Ask Nicely FOIA Download Scrape

Playing Hardball Playing Hardball

5
How do journalists usually get data?

From Humans From Computers

Ask Nicely FOIA Download Scrape

Playing Hardball Playing Hardball

6
How do journalists usually get data?

From Humans From Computers

Ask Nicely FOIA Download Scrape

7
How do journalists usually get data?

From Humans From Computers

Ask Nicely FOIA Download Scrape

Playing Hardball Playing Hardball

8
Web scraping
Today we will extract data
from a single webpage.
This is different from
web crawling, document
scraping, and other kinds
of scraping.

9
HTML
We're going to scrape with HTML. This
is sometimes called source code.
This is how a
website looks
to a human.

11
This is how it
looks to a
computer.

12
Our goal is to
land
somewhere in
the middle.

13
Sometimes source code
itself is interesting.
Jeb Bush's campaign
site included a detailed
summary of the movie
Die Hard.

14
HTML elements

HTML is broken into elements.

Elements are wrapped in tags, that look like this:

<h1>element</h1>

15
HTML elements
For example, tables: Or headers:

<table> <h1>
Here is my table, Here is my header.
between these table </h1>
tags.
</table>

16
HTML elements

There are a lot of different elements, identified by

tags like <h1>, <li> and <a>. If you don't know what a
tag means, use an HTML dictionary.

17
Nested elements
Elements can be inside
other elements. That
means you can grab an
element and all the
elements inside it.

18
Nested elements

Table cell

One cell in a table has

1 the tag <td>, which
Table row stands for "table data."

A table row has the tag

2
<tr>, and contains table Table
cells inside it.

A table element
3 contains both table
rows and table cells. It
has the tag <table>.

19
Nested elements

You can grab a cell from a

table, a row, or a whole
table.

20
Technique 1
ImportHTML
importHTML

Type this formula in Google Sheets:

=ImportHTML(“url”, “element”)

The url is the link you are scraping.

The element is the HTML tag.

22
importHTML

For example:

=ImportHTML("[Link]
solutions/bank-failures/failed-bank-list/",
"table")

This scrapes a table of failed banks from the FDIC.

23
24
Hooray!
We scraped a live webpage.
But the ImportHTML formula is pretty limited.
Let's try something more advanced.

25
Technique 2
ImportXML
Nested elements

Not all data is in a

convenient table.

Instead, you can use an

XPATH.

27
What is an XPATH?
An XPATH is like an address to a very
specific bit of data.
XPATH Examples

All bold text //b

All headers (large text) //h1

All headers containing the //h1[contains(.,'coun

word "country" try')]
//h1[@class='country-
All headers with the class
name']
"country-name"

29
Nested elements

You can also use nested

elements in an XPATH,
just like we saw with
HTML.

30
XPATH Examples
//table[@id='vaccines']/tr[56]/td[3]

//table[@id='vaccines'] tr[56]/td[3]

in the table called

// on this page in row 56 in cell 3
'vaccines'
Having trouble
finding the
XPATH?
You can also use
the Web Inspector.

32
XPATH
Now that we know what
XPATH is (more or less),
let's use it to scrape
something a lot more
specific than tables.

33
importXML

Type this formula in Google Sheets:

=ImportXML(“url”, “XPATH”)

The url is the link you are scraping.

The XPATH is the address of the data.

34
importXML

For example:

=ImportXML("[Link] "//h3")

This scrapes all the headers (that is, job posts) from
the OpenNews job board.

35
importXML

36
ImportHTML and ImportXML

That's just the basics. You can find plenty of in-depth

tutorials on ImportHTML, ImportXML, and other
formulas like ImportFEED.

37
Technique 3
Point-and-Click Apps
OutWit Hub
OutWit Hub is a desktop
app that can identify
each HTML element on a
webpage and scrape it.
The free version lets you
download 100 rows at a
time.

39
ParseHub
ParseHub is a desktop
app that can identify and
scrape elements and
sub-elements. The free
version lets you scrape
200 pages at a time.

40
WebScraper
WebScraper is a browser
extension that helps you
scrape stuff through the
Web Inspector. It only
sometimes works.

41
Disclaimer:
Free apps come and go.
They may not be up to date when you're reading
this. But that's why we learned the code instead.

42
And that's it!
Find me with questions.
I also recommend my newsletter Tools for
Reporters for cool stuff like this. Good luck!

How To Scrape Without Programming PDF
100% (1)
How To Scrape Without Programming PDF
41 pages
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
100% (3)
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
26 pages
Practical Web Scraping For Economists 1744341390
No ratings yet
Practical Web Scraping For Economists 1744341390
33 pages
Web Scraping With: 1 High-Level Overview: The Process of Webscraping
No ratings yet
Web Scraping With: 1 High-Level Overview: The Process of Webscraping
11 pages
Basic Web Scraping
No ratings yet
Basic Web Scraping
24 pages
Download
No ratings yet
Download
4 pages
Web Scraping Techniques Cheat Sheet
No ratings yet
Web Scraping Techniques Cheat Sheet
3 pages
Web+Scraping+Cheat+Sheet+2 0
No ratings yet
Web+Scraping+Cheat+Sheet+2 0
3 pages
Python Web Scraping Basics
No ratings yet
Python Web Scraping Basics
6 pages
Data Scraping
No ratings yet
Data Scraping
63 pages
Scrapingforjournalists Sample
No ratings yet
Scrapingforjournalists Sample
10 pages
Web Scraping and HTML Basics
No ratings yet
Web Scraping and HTML Basics
4 pages
Intro To Web Scraping
No ratings yet
Intro To Web Scraping
13 pages
Web Scraping Course Notes
No ratings yet
Web Scraping Course Notes
89 pages
Scraping
100% (1)
Scraping
25 pages
Web Scraping Basics with Python
No ratings yet
Web Scraping Basics with Python
4 pages
DeVito Et Al 2020 How We Learnt To Stop Worrying and
No ratings yet
DeVito Et Al 2020 How We Learnt To Stop Worrying and
3 pages
Web Scraping for Developers
No ratings yet
Web Scraping for Developers
8 pages
Scraping Book
No ratings yet
Scraping Book
50 pages
Scraping Book Python PDF
No ratings yet
Scraping Book Python PDF
50 pages
Unit 11 Application Development Using Python
No ratings yet
Unit 11 Application Development Using Python
19 pages
Data - Collection Python
No ratings yet
Data - Collection Python
40 pages
Data Analysis by Web Scraping Using Python
No ratings yet
Data Analysis by Web Scraping Using Python
6 pages
A Web Scraper For Extracting Alumni Information From Social
No ratings yet
A Web Scraper For Extracting Alumni Information From Social
4 pages
Arindam Manna, Financial Analytics
No ratings yet
Arindam Manna, Financial Analytics
9 pages
Crawling The Web: Information Retrieval © Crista Lopes, UCI
No ratings yet
Crawling The Web: Information Retrieval © Crista Lopes, UCI
25 pages
Python Libraries For Data Extraction
No ratings yet
Python Libraries For Data Extraction
10 pages
Web Scraping With Python - A Complete Step-By-Step Guide + Code - by Anthony Heath - Geek Culture - Medium
No ratings yet
Web Scraping With Python - A Complete Step-By-Step Guide + Code - by Anthony Heath - Geek Culture - Medium
42 pages
Web Data Extraction Using The Approach of Segmentation and Parsing
No ratings yet
Web Data Extraction Using The Approach of Segmentation and Parsing
7 pages
Webscraping
No ratings yet
Webscraping
12 pages
Engineering-A Review Web Data Scrapping
No ratings yet
Engineering-A Review Web Data Scrapping
4 pages
Text Processing For NLP Web Scrapping
No ratings yet
Text Processing For NLP Web Scrapping
18 pages
Web Scraping: Tools and Techniques
No ratings yet
Web Scraping: Tools and Techniques
34 pages
Seminar Completed
No ratings yet
Seminar Completed
22 pages
Web Scraping: Applications and Tools
100% (2)
Web Scraping: Applications and Tools
31 pages
Web Scraping Ganesh
0% (1)
Web Scraping Ganesh
20 pages
Objective: Homework: Web Crawling
No ratings yet
Objective: Homework: Web Crawling
12 pages
S12 Web Scraping
No ratings yet
S12 Web Scraping
13 pages
Hirschey SymbioticRelationshipsPragmatic 2014
No ratings yet
Hirschey SymbioticRelationshipsPragmatic 2014
32 pages
The A-Z of Web Scraping in 2020 (A How-To Guide)
No ratings yet
The A-Z of Web Scraping in 2020 (A How-To Guide)
18 pages
Web Scraping with Python for Econometrics
No ratings yet
Web Scraping with Python for Econometrics
14 pages
Data Mining for News Article Analysis
No ratings yet
Data Mining for News Article Analysis
30 pages
Web Crawling and Scraping with Python
No ratings yet
Web Crawling and Scraping with Python
34 pages
Web Scraping With Python
No ratings yet
Web Scraping With Python
10 pages
UE20CS203-Unit1-Class6-Scraping The Web, Reading Files (.CSV)
No ratings yet
UE20CS203-Unit1-Class6-Scraping The Web, Reading Files (.CSV)
29 pages
XPath Basics for Web Scrapers
No ratings yet
XPath Basics for Web Scrapers
11 pages
Web Scraping
No ratings yet
Web Scraping
4 pages
Python Web Scraping Guide
No ratings yet
Python Web Scraping Guide
29 pages
Web Scraping Using Python
No ratings yet
Web Scraping Using Python
18 pages
Diouf 2019
No ratings yet
Diouf 2019
3 pages
FDSWeb Scraping
No ratings yet
FDSWeb Scraping
31 pages
Web Scraping
No ratings yet
Web Scraping
5 pages
Unit I
No ratings yet
Unit I
12 pages
BeautifulSoup Notes
No ratings yet
BeautifulSoup Notes
22 pages

2022 Scraping Without Programming Tutorial

Uploaded by

2022 Scraping Without Programming Tutorial

Uploaded by

tinyurl.

It means to grab data

From Humans From Computers

Ask Nicely FOIA Download Scrape

Playing Hardball Playing Hardball

From Humans From Computers

Ask Nicely FOIA Download Scrape

Playing Hardball Playing Hardball

From Humans From Computers

Ask Nicely FOIA Download Scrape

From Humans From Computers

Ask Nicely FOIA Download Scrape

Playing Hardball Playing Hardball

HTML is broken into elements.

Elements are wrapped in tags, that look like this:

There are a lot of different elements, identified by

One cell in a table has

A table row has the tag

You can grab a cell from a

Type this formula in Google Sheets:

The url is the link you are scraping.

This scrapes a table of failed banks from the FDIC.

Not all data is in a

Instead, you can use an

All bold text //b

All headers (large text) //h1

All headers containing the //h1[contains(.,'coun

You can also use nested

in the table called

Type this formula in Google Sheets:

The url is the link you are scraping.

That's just the basics. You can find plenty of in-depth

You might also like