Web Scraping Using Python - Notes
Web Scraping Using Python - Notes
Topics Covered:
● Introduction to Web Scraping
● How Does Web Scraping Work?
● Steps involved in web scraping
● Installing BeautifulSoup
● Installing Requests
● Scraping and Analyzing data from Worldometer website
1
How Does Web Scraping Work?
When we scrape the web, we write code that sends a request to the
server that’s hosting the page we specified. The server will return the
source code — HTML, mostly — for the page (or pages) we requested.
So far, we’re essentially doing the same thing a web browser does —
sending a server request with a specific URL and asking the server to
return the code for that page.
But unlike a web browser, our web scraping code won’t interpret the
page’s source code and display the page visually. Instead, we’ll write
some custom code that filters through the page’s source code looking
for specific elements we’ve specified, and extracting whatever content
we’ve instructed it to extract.
For example, if we wanted to get all of the data from inside a table that
was displayed on a web page, our code would be written to go
through these steps in sequence:
2
● Now, all we need to do is navigate and search the parse tree that
we created, i.e. tree traversal. For this task, we will be using
another third-party python library, Beautiful Soup. It is a Python
library for pulling data out of HTML and XML files.
Installing BeautifulSoup
3
● Or type !pip install beautifulsoup4 or %pip install beautifulsoup4
in a Jupyter notebook cell.
Installing Requests
4
● For installing Pandas Type pip install requests in the Command
prompt/ terminal.
5
Scraping and Analyzing data from Worldometer
website