web scraping using python
web scraping using python
● Scrapy
○ Python framework to extract data from webpages
● Beautiful Soup
○ Python library to parse HTML/XML documents
● Alternatives
○ Selenium
○ Requests
○ Octoparse
Getting started!
How do we do it?
Web Scraping in Python
● Download webpage with urllib2, requests
import requests
data = requests.get(‘http://google.com/’)
html = data.content
Use BeautifulSoup for parsing
Philosophy-
“You didn't write that awful page. You're just trying to get
some data out of it. Beautiful Soup is here to help.”
Export the data