0% found this document useful (0 votes)
84 views8 pages

Module 5-Web Scraping

Web scraping involves using programs to automatically download and parse web pages for information. Key Python modules that facilitate web scraping are requests for downloading files/pages, bs4 for parsing HTML, and selenium for controlling web browsers. Command line arguments passed to a Python script can be accessed via sys.argv to get inputs from the command line.

Uploaded by

Edwin Lobo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views8 pages

Module 5-Web Scraping

Web scraping involves using programs to automatically download and parse web pages for information. Key Python modules that facilitate web scraping are requests for downloading files/pages, bs4 for parsing HTML, and selenium for controlling web browsers. Command line arguments passed to a Python script can be accessed via sys.argv to get inputs from the command line.

Uploaded by

Edwin Lobo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

WEB SCRAPING

Web scraping
• Web scraping: Shows how to write programs that can
automatically download web pages and parse them for
information.
• Web scraping is the term for using a program to download and
process content from the web.
Modules that make it easy to scrape web pages in Python are:
• webbrowser Comes with Python and opens a browser to a
specific page.
• requests Downloads files and web pages from the internet.
• bs4 Parses HTML, the format that web pages are written in.
selenium Launches and controls a web browser.
• The selenium module is able to fill in forms and simulate mouse
clicks in this browser.
• The arguments that are given after the name of the
program in the command line shell of the operating
system are known as Command Line Arguments.
• Python provides various ways of dealing with these types
of arguments. The most common is: 
Using [Link]
import sys

# total arguments
n = len([Link])
print("Total arguments passed:", n)

# Arguments passed
print("\nName of Python script:", [Link][0])

print("\nArguments passed:", end = " ")


for i in range(1, n):
print([Link][i], end = " ")
Program to get the address from commandline

import webbrowser, sys


if len([Link]) > 1:
# Get address from command line.
address = ' '.join([Link][1:])
print(address)
Handle the Clipboard Content and Launch the Browser

import webbrowser, sys, pyperclip


if len([Link]) > 1:
# Get address from command line.
address = ' '.join([Link][1:])
else: # Get address from clipboard.
address = [Link]()
[Link]('[Link]
maps/place/' + address)
Downloading a Web Page with the
[Link]() Function
import requests
res =
[Link]('[Link]
com/files/[Link]')
print(type(res))
print(res.status_code == [Link])
len([Link])
print([Link][:250])
Saving Downloaded Files to the Hard Drive

import requests
res =
[Link]('[Link]
/files/[Link]')
res.raise_for_status()
playFile = open('[Link]', 'wb')
for chunk in res.iter_content(100000):
[Link](chunk)
[Link]()

You might also like