0% found this document useful (0 votes)
12 views43 pages

DATA01 - Data Science Primer

Uploaded by

Robert Nelson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views43 pages

DATA01 - Data Science Primer

Uploaded by

Robert Nelson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

DATA100

Principles of Data Science


Data Science Primer
Reminders

Assignment #1 is due today.


What is data science?
Data Science

Multi-disciplinary field which


aims to derive insights and
knowledge from both structured
and unstructured data.”
Vasant Dhar, 2013
Data Science

Multi-disciplinary field which


aims to derive insights and
knowledge from both structured
and unstructured data.”
Vasant Dhar, 2013
What is data?
According to Merriam-Webster:

Data | Definition of Data


Why the sudden interest?
Data

Big Data
Because of internet and connectivity,
we generate so much data now!
V’s of Big Data
Volume Variety Velocity

refers to the huge refers to the refers to the speed


amount of data diversity of data at which data is
that is generated types and data generated,
daily sources processed and
analyzed

https://blog.unbelievable-machine.com/en/what-is-big-data-definition-five-vs
V’s of Big Data
Veracity Value

refers to the authenticity and refers to the added value for


credibility of the data (and in various stakeholders
turn, its quality) (company, government,
everyone!)

https://blog.unbelievable-machine.com/en/what-is-big-data-definition-five-vs
Accelerating Factors of DS

Data storage is
cheaper
Accelerating Factors of DS

Applications and websites started


collecting more data that is :
Expressiv
e
Granular

Estimated by Statista
Accelerating Factors of DS

Computing power became easier to access


Setting up a supercomputer that is “pay-per-use”
Data Science

Multi-disciplinary field which


aims to derive insights and
knowledge from both structured
and unstructured data.”
Vasant Dhar, 2013
The Data Science Venn Diagram (Conway,
2013)
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Data Science

Multi-disciplinary field which


aims to derive insights and
knowledge from both structured
and unstructured data.”
Vasant Dhar, 2013
Leek, J. and Peng, R. (2015, March 20).
What is the Question?
Science Magazine, 347(6228), pp. 1314-
1315.
Types of Analysis
Types of Analysis
Descriptiv Descriptiv
Aggregating Totals- What is happening?
e e
Explorator Spread and central tendency - What is the average? Are
y there outliers? Trend?
Diagnostic
Difference between two samples (Male vs Female, NCR vs
Inferential Region4).
Correlation is part of inference.

Establish relationship of two variables (What makes our


Causal variables tick)
Predictive
Predictive Looking forward to see what will happen in the future

Prescripti Mechanist Effect estimation and simulation of variables (Policy


ve ic making)
Types of Analysis
Type Question

Descriptive How many COVID cases are there in the Philippines?

Exploratory Is the trend of COVID cases going upwards or downwards?

Inferential Is the daily change of cases in NCR the same as the change of cases in
Region 4?”

Predictive How many COVID cases will happen in the next month?

Causal If mask wearing was not implemented in the Philippines, how will it affect
the number of cases in the country?

Mechanistic How does mask wearing reduce the transmission of COVID?


Types of Analysis
Descriptiv
e
Explorator
y
Inferential
Causal
Predictive
Mechanist
Simulating the effects of lockdown with respect to
ic patients that will get infected
Types of Analysis

Descriptiv
e
Explorator
y
Inferentia
l
Causal
Predictive
The “trend” of commute time is going up.
Mechanist
ic
Types of Analysis
Descripti
ve
Explorato
ry
Inferenti
al
Causal
Predictiv
e
What is the correlation of height and weight? People who
Mechanis are taller are generally observed to be heavier on average.
tic
Types of Analysis
Descriptiv
e
Explorato
ry
Inferentia
l
Causal
Predictive
What is the GDP Next year?
Mechanist
ic
Types of Analysis

Descriptiv
e
Explorato
ry
Inferentia
l
Causal
Predictive
Mechanist Are Males paid more on average as compared to Females?
Is gender gap real? Whats the difference between the two
ic groups?
Types of Analysis

Descriptiv
e
Explorator
y
Inferential
Causal
Predictive
Mechanisti
c What is the total number of covid patients?
Pipeline and Roles
who does what?
Analytics

Data Informatio Insight Imperative


n
Creation of data Finding patterns Development of
from operations Consolidation of to answer why it various solutions
and/or relevant data to happened and to suggest what
transactions answer a what could should be done
question or a potentially next
problem happen next
Job Roles

Data Information Insight Imperative

Data Steward Data Engineer Data Scientist Functional


Analyst

Analytics Manager
Data Steward Data Data
Engineer Scientist
data governance
data privacy data data analytics
data security infrastructure data mining
data quality
data ETL machine learning
(extract, transform,
load) statistical
data management modeling
data warehousing
Functional Analyst Analytics Manager

domain expert project manager


process/policy expert
data visualization engineer
Data Science Roles Scatter Plot (Vicki Boykis, April
2018)
https://twitter.com/vboykis/status/983391619062919170
Skills and Competencies
what should you have to be good in data science?
The APEC Analytics Competencies

Domain Knowledge & Application Data Engineering Principles

Data Management & Governance Statistical Techniques

Data Analytics Methods &


Operational Analytics
Algorithms

Data Visualization & Presentation Computing

Research Methods 21st Century Skills


How do you get started with a
data science project?
Data Science Project Cycle

Problem Start with a question that you want to answer or a


problem you want to solve

Data Maximize the potential of available data from various


sources or collect your own

Explore Understand the information embedded within the data

Learn Use statistical models, machine learning and other


data mining techniques to find patterns faster

Present Effectively visualize and communicate the insights and


knowledge derived from the data
Application in the
Real World
Data science can be applied to any field of study!
What to watch?
Anomaly Detection
Forecasting
Computer Vision + IoTs + Robotics
Questions?

You might also like