The document discusses how to inspect a website in order to locate and extract desired data through web scraping. It advises to first read the website's terms and conditions to understand legal use of the data. When scraping, it is important not to download data too rapidly so as not to break the site or get blocked. The document also recommends learning basic HTML tags to understand how to find relevant data within the multiple levels of tags on a webpage.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
87 views3 pages
WEb Scrape
The document discusses how to inspect a website in order to locate and extract desired data through web scraping. It advises to first read the website's terms and conditions to understand legal use of the data. When scraping, it is important not to download data too rapidly so as not to break the site or get blocked. The document also recommends learning basic HTML tags to understand how to find relevant data within the multiple levels of tags on a webpage.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3
Important notes about web scraping:
1. Read through the website’s Terms and
Conditions to understand how you can legally use the data. Most sites prohibit you from using the data for commercial purposes. 2. Make sure you are not downloading data at too rapid a rate because this may break the website. You may potentially be blocked from the site as well.
Inspecting the Website
The first thing that we need to do is to figure out where we can locate the links to the files we want to download inside the multiple levels of HTML tags. Simply put, there is a lot of code on a website page and we want to find the relevant pieces of code that contains our data. If you are not familiar with HTML tags, refer to W3Schools Tutorials. It is important to understand the basics of HTML in order to successfully web scrape. Important notes about web scraping: 1. Read through the website’s Terms and Conditions to understand how you can legally use the data. Most sites prohibit you from using the data for commercial purposes. 2. Make sure you are not downloading data at too rapid a rate because this may break the website. You may potentially be blocked from the site as well.
Inspecting the Website
The first thing that we need to do is to figure out where we can locate the links to the files we want to download inside the multiple levels of HTML tags. Simply put, there is a lot of code on a website page and we want to find the relevant pieces of code that contains our data. If you are not familiar with HTML tags, refer to W3Schools Tutorials. It is important to understand the basics of HTML in order to successfully web scrape. New York MTA Data We will be downloading turnstile data from this site:
Turnstile data is compiled every week from May 2010 to
present, so hundreds of .txt files exist on the site. Below is a snippet of what some of the data looks like. Each date is a link to the .txt file that you can download.