Where To Find Data PDF
Where To Find Data PDF
FIND DATA
DATA
APIS
PORTALS
APIs (application programming
A lot of government data is interfaces) are developer tools that
available online. allow you to access data directly
from companies.
You can use census data,
employment data, the general You know how you can type in a URL
social survey, and tons of local and get to a website?
government data such as New York
City’s 911 calls or traffic counts. APIs are like URLs, but instead of a
website, you get data.
Sometimes you can download this
data directly as a CSV file; at other Some examples of companies with
times, you need to use an API. helpful APIs are The New York Times,
Yelp , Spotify, Netflix, or The
You can even submit Freedom of
Weather Channel.
Information Act requests to
government agencies to get data
Some APIs even have R or Python
that isn’t publicly listed.
packages that specifically make it
easier to work with them. rtweet for R,
Government information is great
for example, lets you pull Twitter data
because it’s often detailed and
quickly so that you can find tweets
deals with unusual subjects, such as
with a specific hashtag, what the
data on the registered pet names of
trending topics in Sacramento are, or
every animal in Elk Grove, California.
what tweets Naval Ravikant is
favoriting.
The downside - or potential upside
as it poses a great challenge - of
Keep in mind that there are
government information is that it
limitations and terms of service to
often isn’t well formatted, such as
how you can use these APIs.
tables stored within PDF files.
"YOU CAN HAVE DATA APIs are great for providing extremely
W I T H O U T I N F O R M A T I O N , robust, organized data from many
sources.
BUT YOU CANNOT HAVE
INFORMATION WITHOUT DATA"
-DANIEL KEYS MORAN
YOUR
WEB
OWN
SCRAPING
DATA
Web scraping is a way to extract data
from websites that don’t have an API,
There are many places where you
essentially by automating visiting web
can download data about yourself;
pages and copying the data.
social media websites and email
services are two big ones.
You could create a program to search
a movie website for a list of 100 actors,
But if you use apps to keep track of
load their actor profiles, copy the lists
your physical activity, reading list,
of movies they’re in, and put that data
budget, sleep, or anything else, you
can usually download that data as
in a spreadsheet. You do have to be
well.
careful, though: scraping a website
can be against the website’s terms of
Maybe you could build a chatbot
use, and you could be banned. You
based on your emails with your
can check the robots.txt file of a
colleagues or friends. Or you could
website to find out what is allowed.
look at the most common words
you use in your tweets and how
You also want to be nice to websites:
those words have changed over
if you hit a site too many times, you
time.
can bring it down.
Visit it here:
https://archive.ics.uci.edu/ml/datas
ets.php
STACK
EXCHANGE
Visit opendata.stackexchange.com
to check it out.
OPEN DATA
Amazon makes large data sets
available on its Amazon Web
Services platform:
aws.amazon.com/opendata ACADEMIC
You can download the data and
work with it on your own computer,
or analyze the data in the cloud
using EC2 and Hadoop via EMR.
TORRENTS