0% found this document useful (0 votes)
7 views39 pages

Team1 - Data Science Methodology

The document outlines the data science methodology, emphasizing the structured approach to problem-solving and data-driven decision-making. It details stages such as business understanding, data collection, preparation, modeling, evaluation, and feedback, highlighting the importance of each phase in achieving effective outcomes. The methodology aims to avoid common pitfalls by ensuring thorough analysis before jumping to solutions.

Uploaded by

ngadtqqe180219
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views39 pages

Team1 - Data Science Methodology

The document outlines the data science methodology, emphasizing the structured approach to problem-solving and data-driven decision-making. It details stages such as business understanding, data collection, preparation, modeling, evaluation, and feedback, highlighting the importance of each phase in achieving effective outcomes. The methodology aims to avoid common pitfalls by ensuring thorough analysis before jumping to solutions.

Uploaded by

ngadtqqe180219
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

DATA SCIENCE

METHODOLOGY
WEEK 1
DATA SCIENCE
METHODOLOGY OVERVIEW
WHAT IS A METHODOLOGY?

A methodology is:
. A system of methods
· A guideline for decision-making
during the scientific process
APPLYING DATA SCIENCE
METHODOLOGY

Structured approach for solving


problems and making data-driven
decisions Includes :
Perform data collection
Create of measurement
strategies
Data analysis method
comparisons
ADDRESSING DATA SCIENCE
CHALLENGES

Apply practical guidance

Avoid the mistakes that can


happen by jumping to
solutions before the analysis
DATA METHODOLOGY SATGES
BUSINESS UNDERSTANDING
BUSINESS UNDERSTANDING

Understanding the objective is very important in


choosing the data science methodology.

Once the goal is clarified, the next piece of the puzzle is


to figure out the objectives that are in support of the
goal.

Depending on the problem, different stakeholders will


need to be engaged in the discussion to help
determine requirements and clarify questions.
Analytic Approach
Selecting the right analytic approach depends on
the question being asked.

The approach involves seeking clarification from


the person who is asking the question, so as to be
able to pick the most appropriate path or
approach

This means identifying what type of patterns will


be needed to address the question most
effectively.
DATA REQUIREMENTS

You want to prepare a


dish?

Step 1 : know about their


recipe and ingredients

Step 2 : collect the


ingredients and know
how to work with them
As cooking with data, the data scientist
needs to identify: which ingredients are
required, how to source or to collect
them, how to understand or work with
them, and how to prepare the data to
meet the desired outcome.
Prior to undertaking the data collection
and data preparation stages of the
methodology, it's vital to define the data
requirements for decision-tree
classification. This includes identifying
the necessary data content, formats
and sources for initial data collection.
DATA COLLECTION

A process of synthesizing all the


information from many different sources
and storing it in an established system

The purpose of collecting data is to


serve analysis, research, management,
business or making decisions related to
fields such as science, society, business..
WEEK 2
DATA UNDERSTANDING

Data understanding
encompasses all activities
related to constructing the data
set.
The data understanding
section of the data science
methodology answers the
question: Is the data that you
collected representative of the
problem to be solved?
Data Preparation
In a sense, data preparation is similar
to washing freshly picked vegetables
in so far as unwanted elements, such
as dirt or imperfections, are removed.
Together with data collection and
data understanding, data preparation
is the most time-consuming phase of
a data science project, typically
taking seventy percent and even up
to even ninety percent of the overall
project time.
Similarly, transforming data in the
Data Preparation data preparation phase is the
process of getting the data into a
state where it may be easier to
work with.

To work effectively with the data, it


must be prepared in a way that
addresses missing or invalid values
and removes duplicates, toward
ensuring that everything is properly
formatted
DATA PREPARATION - CASE STUDY

Data Collection
Data Cleaning
Data Transformation
Data Integration
Data Reduction
Data Validation
FROM MODELING TO EVALUATION
MODELING

What is the Purpose of Data Modeling?


Developing models that are either
descriptive or predictive.
What are some characteristics of this
process?
Based on statistical or machine learning
approaches.
Uses a training set with known outcomes
to calibrate the model.
Involves experimenting with different
algorithms and variables.
EVALUATION

A model evaluation goes


hand-in-hand with model
building as such, the
modeling and evaluation
stages are done iteratively.

Model evaluation is
performed during model
development and before the
model is deployed.
The first is the diagnostic measures phase,
which is used to ensure the model is
working as intended.
If the model is a predictive model, a
decision tree can be used to evaluate if
the answer the model can output, is
aligned to the initial design. It can be
used to see where there are areas that
require adjustments.
If the model is a descriptive model, one
in which relationships are being
assessed, then a testing set with known
outcomes can be applied, and the
model can be refined as needed.
The second phase of evaluation that may
be used is statistical significance testing.
This type of evaluation can be applied to
the model to ensure that the data is being
properly handled and interpreted within the
model. This is designed to avoid
unnecessary second guessing when the
answer is revealed
WEEK 3
UNDERSTANDING DEPLOYMENT

Case Study - Understand the results


Case Study - Gathering application
requirements

Assimilate knowledge for business


. Practical understanding of the meaning of model results
. Implications of model results for designing intervention
actions
FEEDBACK
DEVELOPMENT PHASE

Data Preparation
Model Building
Model Evaluation
TRANSITION TO FEEDBACK PHASE

Deployment
Monitoring
Feedback Collection
FEEDBACK PHASE

Error Analysis
Model Refinement
Retraining
Test
Assessing model performance
THANK YOU
CASE STUDY-FEEDBACK

Assessing model performance


Refinement
Redeployment

You might also like