0% found this document useful (0 votes)
87 views3 pages

WEb Scrape

The document discusses how to inspect a website in order to locate and extract desired data through web scraping. It advises to first read the website's terms and conditions to understand legal use of the data. When scraping, it is important not to download data too rapidly so as not to break the site or get blocked. The document also recommends learning basic HTML tags to understand how to find relevant data within the multiple levels of tags on a webpage.

Uploaded by

aileen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views3 pages

WEb Scrape

The document discusses how to inspect a website in order to locate and extract desired data through web scraping. It advises to first read the website's terms and conditions to understand legal use of the data. When scraping, it is important not to download data too rapidly so as not to break the site or get blocked. The document also recommends learning basic HTML tags to understand how to find relevant data within the multiple levels of tags on a webpage.

Uploaded by

aileen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Important notes about web scraping:

1. Read through the website’s Terms and


Conditions to understand how you can legally
use the data. Most sites prohibit you from using
the data for commercial purposes.
2. Make sure you are not downloading data at too
rapid a rate because this may break the
website. You may potentially be blocked from
the site as well.

Inspecting the Website


The first thing that we need to do is to figure out where
we can locate the links to the files we want to
download inside the multiple levels of HTML tags.
Simply put, there is a lot of code on a website page and
we want to find the relevant pieces of code that
contains our data. If you are not familiar with HTML
tags, refer to W3Schools ​Tutorials​. It is important to
understand the basics of HTML in order to
successfully web scrape.
Important notes about web scraping:
1. Read through the website’s Terms and
Conditions to understand how you can legally
use the data. Most sites prohibit you from using
the data for commercial purposes.
2. Make sure you are not downloading data at too
rapid a rate because this may break the
website. You may potentially be blocked from
the site as well.

Inspecting the Website


The first thing that we need to do is to figure out where
we can locate the links to the files we want to
download inside the multiple levels of HTML tags.
Simply put, there is a lot of code on a website page and
we want to find the relevant pieces of code that
contains our data. If you are not familiar with HTML
tags, refer to W3Schools ​Tutorials​. It is important to
understand the basics of HTML in order to
successfully web scrape.
New York MTA Data
We will be downloading turnstile data from this site:

http://web.mta.info/developers/turnstile.html

Turnstile data is compiled every week from May 2010 to


present, so hundreds of .txt files exist on the site. Below is a
snippet of what some of the data looks like. Each date is a link
to the .txt file that you can download.

You might also like