Python Web Scraping
Python Web Scraping
Web scraping
Web scraping is a term used to describe the use of a program or algorithm to extract and process
large amounts of data from the web. Whether you are a data scientist, engineer, or anybody who
analyzes large amounts of datasets, the ability to scrape data from the web is a useful skill to
have.
Why we scrap
Web pages contain wealth of information (in text form), designed mostly for human
consumption. Interfacing with 3rd party with no API access. Websites are more important than
API’s Anonymous access.
You should check a website’s Terms and Conditions before you scrape it. Be careful to read the
statements about legal use of data. Usually, the data you scrape should not be used for commercial
purposes.
How it works
Do not request data from the website too aggressively with your program (also known as
spamming), as this may break the website. Make sure your program behaves in a reasonable
manner (i.e. acts like a human). One request for one webpage per second is good practice.
The layout of a website may change from time to time, so make sure to revisit the site and rewrite
your code as needed.