WEB SCRAPING
Web scraping
• Web scraping: Shows how to write programs that can
automatically download web pages and parse them for
information.
• Web scraping is the term for using a program to download and
process content from the web.
Modules that make it easy to scrape web pages in Python are:
• webbrowser Comes with Python and opens a browser to a
specific page.
• requests Downloads files and web pages from the internet.
• bs4 Parses HTML, the format that web pages are written in.
selenium Launches and controls a web browser.
• The selenium module is able to fill in forms and simulate mouse
clicks in this browser.
• The arguments that are given after the name of the
program in the command line shell of the operating
system are known as Command Line Arguments.
• Python provides various ways of dealing with these types
of arguments. The most common is:
Using [Link]
import sys
# total arguments
n = len([Link])
print("Total arguments passed:", n)
# Arguments passed
print("\nName of Python script:", [Link][0])
print("\nArguments passed:", end = " ")
for i in range(1, n):
print([Link][i], end = " ")
Program to get the address from commandline
import webbrowser, sys
if len([Link]) > 1:
# Get address from command line.
address = ' '.join([Link][1:])
print(address)
Handle the Clipboard Content and Launch the Browser
import webbrowser, sys, pyperclip
if len([Link]) > 1:
# Get address from command line.
address = ' '.join([Link][1:])
else: # Get address from clipboard.
address = [Link]()
[Link]('[Link]
maps/place/' + address)
Downloading a Web Page with the
[Link]() Function
import requests
res =
[Link]('[Link]
com/files/[Link]')
print(type(res))
print(res.status_code == [Link])
len([Link])
print([Link][:250])
Saving Downloaded Files to the Hard Drive
import requests
res =
[Link]('[Link]
/files/[Link]')
res.raise_for_status()
playFile = open('[Link]', 'wb')
for chunk in res.iter_content(100000):
[Link](chunk)
[Link]()