0% found this document useful (0 votes)
140 views9 pages

Introduction To Data Science: Bill Howe, PHD

This document provides an introduction to data science. It defines data science as an emerging field that involves collecting, preparing, analyzing, visualizing, managing, and preserving large datasets. The document discusses how data scientists obtain data, clean it, explore it, model it, and interpret it, blending skills in hacking, statistics, and machine learning. It also summarizes that data scientists create "data products", such as applications, interactive visualizations, and online databases, that allow others to utilize and analyze data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
140 views9 pages

Introduction To Data Science: Bill Howe, PHD

This document provides an introduction to data science. It defines data science as an emerging field that involves collecting, preparing, analyzing, visualizing, managing, and preserving large datasets. The document discusses how data scientists obtain data, clean it, explore it, model it, and interpret it, blending skills in hacking, statistics, and machine learning. It also summarizes that data scientists create "data products", such as applications, interactive visualizations, and online databases, that allow others to utilize and analyze data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Introduction to Data Science

Bill Howe, PhD


Director of Research,
Scalable Data Analytics
University of Washington
eScience Institute
What is Data Science?
Fortune
Hot New Gig in Tech
Hal Varian, Googles Chief Economist, NYT, 2009:
The next sexy job
The ability to take datato be able to understand it, to
process it, to extract value from it, to visualize it, to
communicate itthats going to be a hugely important skill.
Mike Driscoll, CEO of metamarkets:
Data science, as it's practiced, is a blend of Red-Bull-fueled
hacking and espresso-inspired statistics.
Data science is the civil engineering of data. Its acolytes
possess a practical knowledge of tools & materials, coupled
with a theoretical understanding of what's possible.
4/28/13 Bill Howe, UW 2
Drew Conways Data Science Venn Diagram

4/28/13 Bill Howe, UW 3


What do data scientists do?
They need to find nuggets of truth in data and then explain it to the
business leaders
-- Rchard Snee, EMC

Data scientists tend to be hard scientists, particularly physicists, rather


than computer science majors. Physicists have a strong mathematical
background, computing skills, and come from a discipline in which survival
depends on getting the most from the data. They have to think about the
big picture, the big problem.

-- DJ Patil, Chief Scientist at LinkedIn

4/28/13 Bill Howe, UW 4


Mike Driscolls three sexy skills of data geeks

Statistics
traditional analysis
Data Munging
parsing, scraping, and formatting data
Visualization
graphs, tools, etc.

4/28/13 Bill Howe, UW 5


Data Science refers to an emerging area of work
concerned with the collection, preparation, analysis,
visualization, management and preservation of large
collections of information.

An Introduction to Data Science

Jeffrey Stanton
Syracuse University School of Information Studies

4/28/13 Bill Howe, UW 6


A data scientist is someone who can obtain, scrub, explore, model
and interpret data, blending hacking, statistics and machine
learning. Data scientists not only are adept at working with data, but
appreciate data itself as a first-class product.
-- Hilary Mason, chief scientist at bit.ly

data wrangling
data jujitsu
data munging

4/28/13 Bill Howe, UW 7


Three types of tasks:

1) Preparing to run a model


Gathering, cleaning, integrating, restructuring,
transforming, loading, filtering, deleting, combining,
merging, verifying, extracting, shaping, massaging

2) Running the model

3) Communicating the results

4/28/13 Bill Howe, UW 8


Data Science is about Data Products
Data-driven apps (Mike Loukides)
Spellchecker
Machine Translator Data science is about building data
products, not just answering questions
Interactive visualizations
Google flu application Data products empower others to use
the data.
Global Burden of Disease
Online Databases May help communicate your results
(e.g., Nate Silvers maps)
Enterprise data warehouse
Sloan Digital Sky Survey May empower others to do their own
analysis
(e.g., Global Burden of Disease)

4/28/13 Bill Howe, UW 9

You might also like