0% found this document useful (0 votes)
26 views

Web-Scraping-With-Python

Web scraping is the automated process of extracting information from the internet using scripts or programs. It has various applications, including product comparison, review analysis, and data tracking. The document outlines the general process of web scraping using Python, including tools like BeautifulSoup, and emphasizes the importance of legality and permissions when scraping data.

Uploaded by

abdllahbahou33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Web-Scraping-With-Python

Web scraping is the automated process of extracting information from the internet using scripts or programs. It has various applications, including product comparison, review analysis, and data tracking. The document outlines the general process of web scraping using Python, including tools like BeautifulSoup, and emphasizes the importance of legality and permissions when scraping data.

Uploaded by

abdllahbahou33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Web Scraping with

Python
By Zachary King
What is Web Scraping?
Web Scraping is the process of using a script or computer
program to retrieve information from the Internet.

The process is usually automatic but can involve manual


input if desired.
Purpose of Web Scraping
➢ Web scraping makes it easy to retrieve exactly what you
need from a webpage.
➢ No tedious searching of long--or even short--pages
manually.
➢ Statistical programs such as for research, testing,
tracking, etc.
➢ Automate common visits to the web
Applications
➢ Scrape product pages from retailer or manufacturer websites to
show in their own website or provide specs/price comparison
➢ Scrape product reviews from retailers to detect fraudulent
reviews
➢ Scrape news websites for analysis, often for providing better
targeted news to their audience
➢ Scrape sports pages for stat tracking on individual teams or
players
➢ Scrape your Facebook news feed for your own Facebook
application! (or other social media)
General Process
1. Fetch a web page
2. Download web page content (optional)
3. Parse data (HTML)
4. Apply parsed data (your usage)
Using Python
Some packages:
-bs4 (BeautifulSoup4)**
-urllib2 (for Python 2)
-urllib (for Python 3)**
-requests (for Python 3)
-urllib.request (Python 3)**
Go Fetch!
To simply get the HTML content of a web
page and output it:
Specific Searches
With BeautifulSoup, create a “soup” object that allows for easy searching within
the contents of the web page.
Output
*More Specific Searches
Use multiple “soups” to search specific parts of the web page.
Output
Child Elements
An approach to retrieving all the child elements for a given tag are by using the
.children attribute of BeautifulSoup objects.
Output
Extending your Scraper
I have my scraped data, now what?
➢ Graphs/charts for visual representation
➢ Output to a file
➢ Store in an organized manner (data structures)
➢ Reformat into a new web page
What Now?
➢ Bare in mind the legality of web scraping (it’s a blurry line).
➢ Always get the green light from the owner of the site (preferably
recorded/signed), before scraping their data.
➢ Check out the docs for BeautifulSoup at http://www.crummy.
com/software/BeautifulSoup/bs4/doc/
➢ Take a refresher with the bs4 beginner article at http://www.
pythonforbeginners.com/python-on-the-web/beautifulsoup-4-
python/
Questions?
You can download all of my example files from this presentation,
as well as my more complete Python web scraping files from my
GitHub at https://github.com/zach-king/Python-Web-Scraping

You might also like