0% found this document useful (0 votes)
5 views22 pages

Data Science Presentation

Data science is an interdisciplinary field focused on extracting knowledge from structured and unstructured data through scientific methods and algorithms. It aims to analyze real-world phenomena and is distinct from big data, which serves as the raw material for data science. Applications of data science include fraud detection, recommender systems, and improving healthcare outcomes.

Uploaded by

anonymous775575
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views22 pages

Data Science Presentation

Data science is an interdisciplinary field focused on extracting knowledge from structured and unstructured data through scientific methods and algorithms. It aims to analyze real-world phenomena and is distinct from big data, which serves as the raw material for data science. Applications of data science include fraud detection, recommender systems, and improving healthcare outcomes.

Uploaded by

anonymous775575
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

WHAT IS DATA SCIENCE?

• “Data science, also known as data-driven science, is


an interdisciplinary field of scientific methods,
processes, algorithms and systems to extract
knowledge or insights from data in various forms,
either structured or unstructured, similar to
data mining.”

5
WHAT IS DATA SCIENCE?
• “Data science, also known as data-driven science, is
an interdisciplinary field of scientific methods,
processes, algorithms and systems to extract
knowledge or insights from data in various forms,
either structured or unstructured, similar to
data mining.”
• “Data science intends to analyze and understand
actual phenomena with ‘data’. In other words, the
aim of data science is to reveal the features or the
hidden structure of complicated natural, human,
and social phenomena with data from a different
point of view from the established or traditional theory
and method.” 5
WHAT IS DATA SCIENCE?
• Fourth paradigm
• “… change of all sciences moving from
observational, to theoretical, to computational and
now to the 4th Paradigm – Data-Intensive Scientific
Discovery”

6
WHAT IS
IMPORTANT?

Need to solve a real problem using


data… No applications, no data
science.

7
DATA SCIENCE AS A UNIFIER

Humaniti
Data es Machin
Manageme Statistic
e/
nt al
Learnin

La
Data g Applicatio
n Domain
w Scienc Expertise

e
Social
Visualizatio
Scienc
e n
Mathematic
al
Optimizatio 8
DATA SCIENCE AND BIG DATA
• They are not the “same thing”
• Big data = crude oil
• Big data is about extracting “crude oil”, transporting it in
“mega tankers”, siphoning it through “pipelines”, and storing
it in “massive silos”
• Data science is about refining the “crude oil”

9
DATA SCIENCE AND ARTIFICIAL
INTELLIGENCE

Data ML/DM/
Analytic
Artificial
Scienc s Intelligen
e ce

1
0
DATA SCIENCE AND ARTIFICIAL
INTELLIGENCE

Data ML/DM/
Analytic
Artificial
Scienc s Intelligen
e ce

“Data science produces insights.


Machine learning produces 1
0
DATA SCIENCE APPLICATION
EXAMPLES
• Fraud detection
• Investigate fraud patterns in past
data
• Early detection is important
• Before damage propagates
• Harder than late detection
• Precision is important
• False positive and false negative are
both bad
• Real-time analytics

11
DATA SCIENCE APPLICATION
EXAMPLES
• Recommender systems
• The ability to offer
unique
personalized service
• Increase sales, click-through
rates, conversions, …
• Netflix recommender system
valued at
$1B per year
• Amazon recommender system
drives a 20-35% lift in sales
annually
• Collaborative filtering at scale
12
DATA SCIENCE APPLICATION
EXAMPLES
• Predicting why patients are
being readmitted
• Reduce costs
• Improve population health
• Find the “why” behind
specific populations
being readmitted
• Data lakes of multiple d
data sources
• Investigate ties between
readmission an socioeconomic
data points, patient history, 13
DATA SCIENCE APPLICATION
EXAMPLES
• “Smart cities”
• Not well-
defined

1
4
DATA SCIENCE APPLICATION
EXAMPLES
• “Smart cities”
• Not well-
defined

1
4
DATA SCIENCE APPLICATION
EXAMPLES
• “Smart cities”
• Not well-defined
• Generally refers to using data
and ICT to
• Better plan communities
• Better manage assets
• Reduce costs
• Deploy open data to better
engage with community

1
4
DATA SCIENCE APPLICATION
EXAMPLES
• Moneyball
• How to build a baseball team on a
very low budget by relying on
data
• Sabermetrics: the statistical
analysis of baseball data to
objectively evaluate
performance
• 2002 record of 103-59 was joint
best in MLB
• Team salary budget: $40 million
• Other team: Yankees
15
HOLISTIC APPROACH TO DATA
SCIENCE
Core

Data Security &


Privacy
Data Making
Data
Management Modeling Dissemination
Data
of Big & &
Acquisition Trustable & Data Analysis Visualization Preservation
Usable

Ethics, Policy & Social


Impact

Applicatio Applicatio Applicatio Applicatio


n n n n
16
CORE RESEARCH ISSUES &
INTERACTIONS
Making
Data
Trustable &
Usable

Big Data Modelling


Manageme &
nt Analysis

Data
Visualization
&
Disseminatio 1
7
CORE RESEARCH ISSUES &
• Data cleaning
INTERACTIONS • Sampling
Making
Data • Data
Trustable & provenance
Usable

Big Data Modelling


Manageme &
nt Analysis

Data
Visualization
&
Disseminatio 1
7
CORE RESEARCH ISSUES &
• Data cleaning
INTERACTIONS • Sampling
Making
• Data lakes Data • Data
• Batch & online Trustable & provenance
access Usable
• Platforms

Big Data Modelling


Manageme &
nt Analysis

Data
Visualization
&
Disseminatio 1
7
CORE RESEARCH ISSUES &
• Data cleaning
INTERACTIONS • Sampling
Making
• Data lakes Data • Data
• Batch & online Trustable & provenance
access Usable
• Platforms

Big Data Modelling


Manageme &
nt Analysis
• Models & methods for
data lakes
• Unsupervised
Data
classification &
Visualization AI
&
Disseminatio 1
7
CORE RESEARCH ISSUES &
• Data cleaning
INTERACTIONS • Sampling
Making
• Data lakes Data • Data
• Batch & online Trustable & provenance
access Usable
• Platforms

Big Data Modelling


Manageme &
nt Analysis
• Visualization for • Models & methods for
wider audience
data lakes
• Visualization for data
exploration • Unsupervised
Data
• Open data classification &
Visualization
technologies & AI
Disseminatio 1
7
CORE RESEARCH ISSUES &
• Data cleaning
INTERACTIONS • Sampling
Making
• Data lakes Data • Data
• Batch & online Trustable & provenance
access Usable
• Platforms • DM support
for
provenance
• Data preparation forModelling
Big Data• Cleaning for
big data &
Management analys
data • management
DM for ML Analysis
• Visualization for is • ML for DM
• • Models & methods for
wider audience Visual
data lakes
• Visualization for data analytics…
exploration • Unsupervised
Data
• Open data classification &
Visualization
technologies & AI
Disseminatio 1
7

You might also like