Data Science Presentation
Data Science Presentation
5
WHAT IS DATA SCIENCE?
• “Data science, also known as data-driven science, is
an interdisciplinary field of scientific methods,
processes, algorithms and systems to extract
knowledge or insights from data in various forms,
either structured or unstructured, similar to
data mining.”
• “Data science intends to analyze and understand
actual phenomena with ‘data’. In other words, the
aim of data science is to reveal the features or the
hidden structure of complicated natural, human,
and social phenomena with data from a different
point of view from the established or traditional theory
and method.” 5
WHAT IS DATA SCIENCE?
• Fourth paradigm
• “… change of all sciences moving from
observational, to theoretical, to computational and
now to the 4th Paradigm – Data-Intensive Scientific
Discovery”
6
WHAT IS
IMPORTANT?
7
DATA SCIENCE AS A UNIFIER
Humaniti
Data es Machin
Manageme Statistic
e/
nt al
Learnin
La
Data g Applicatio
n Domain
w Scienc Expertise
e
Social
Visualizatio
Scienc
e n
Mathematic
al
Optimizatio 8
DATA SCIENCE AND BIG DATA
• They are not the “same thing”
• Big data = crude oil
• Big data is about extracting “crude oil”, transporting it in
“mega tankers”, siphoning it through “pipelines”, and storing
it in “massive silos”
• Data science is about refining the “crude oil”
9
DATA SCIENCE AND ARTIFICIAL
INTELLIGENCE
Data ML/DM/
Analytic
Artificial
Scienc s Intelligen
e ce
1
0
DATA SCIENCE AND ARTIFICIAL
INTELLIGENCE
Data ML/DM/
Analytic
Artificial
Scienc s Intelligen
e ce
11
DATA SCIENCE APPLICATION
EXAMPLES
• Recommender systems
• The ability to offer
unique
personalized service
• Increase sales, click-through
rates, conversions, …
• Netflix recommender system
valued at
$1B per year
• Amazon recommender system
drives a 20-35% lift in sales
annually
• Collaborative filtering at scale
12
DATA SCIENCE APPLICATION
EXAMPLES
• Predicting why patients are
being readmitted
• Reduce costs
• Improve population health
• Find the “why” behind
specific populations
being readmitted
• Data lakes of multiple d
data sources
• Investigate ties between
readmission an socioeconomic
data points, patient history, 13
DATA SCIENCE APPLICATION
EXAMPLES
• “Smart cities”
• Not well-
defined
1
4
DATA SCIENCE APPLICATION
EXAMPLES
• “Smart cities”
• Not well-
defined
1
4
DATA SCIENCE APPLICATION
EXAMPLES
• “Smart cities”
• Not well-defined
• Generally refers to using data
and ICT to
• Better plan communities
• Better manage assets
• Reduce costs
• Deploy open data to better
engage with community
1
4
DATA SCIENCE APPLICATION
EXAMPLES
• Moneyball
• How to build a baseball team on a
very low budget by relying on
data
• Sabermetrics: the statistical
analysis of baseball data to
objectively evaluate
performance
• 2002 record of 103-59 was joint
best in MLB
• Team salary budget: $40 million
• Other team: Yankees
15
HOLISTIC APPROACH TO DATA
SCIENCE
Core
Data
Visualization
&
Disseminatio 1
7
CORE RESEARCH ISSUES &
• Data cleaning
INTERACTIONS • Sampling
Making
Data • Data
Trustable & provenance
Usable
Data
Visualization
&
Disseminatio 1
7
CORE RESEARCH ISSUES &
• Data cleaning
INTERACTIONS • Sampling
Making
• Data lakes Data • Data
• Batch & online Trustable & provenance
access Usable
• Platforms
Data
Visualization
&
Disseminatio 1
7
CORE RESEARCH ISSUES &
• Data cleaning
INTERACTIONS • Sampling
Making
• Data lakes Data • Data
• Batch & online Trustable & provenance
access Usable
• Platforms