Team1 - Data Science Methodology
Team1 - Data Science Methodology
METHODOLOGY
WEEK 1
DATA SCIENCE
METHODOLOGY OVERVIEW
WHAT IS A METHODOLOGY?
A methodology is:
. A system of methods
· A guideline for decision-making
during the scientific process
APPLYING DATA SCIENCE
METHODOLOGY
Data understanding
encompasses all activities
related to constructing the data
set.
The data understanding
section of the data science
methodology answers the
question: Is the data that you
collected representative of the
problem to be solved?
Data Preparation
In a sense, data preparation is similar
to washing freshly picked vegetables
in so far as unwanted elements, such
as dirt or imperfections, are removed.
Together with data collection and
data understanding, data preparation
is the most time-consuming phase of
a data science project, typically
taking seventy percent and even up
to even ninety percent of the overall
project time.
Similarly, transforming data in the
Data Preparation data preparation phase is the
process of getting the data into a
state where it may be easier to
work with.
Data Collection
Data Cleaning
Data Transformation
Data Integration
Data Reduction
Data Validation
FROM MODELING TO EVALUATION
MODELING
Model evaluation is
performed during model
development and before the
model is deployed.
The first is the diagnostic measures phase,
which is used to ensure the model is
working as intended.
If the model is a predictive model, a
decision tree can be used to evaluate if
the answer the model can output, is
aligned to the initial design. It can be
used to see where there are areas that
require adjustments.
If the model is a descriptive model, one
in which relationships are being
assessed, then a testing set with known
outcomes can be applied, and the
model can be refined as needed.
The second phase of evaluation that may
be used is statistical significance testing.
This type of evaluation can be applied to
the model to ensure that the data is being
properly handled and interpreted within the
model. This is designed to avoid
unnecessary second guessing when the
answer is revealed
WEEK 3
UNDERSTANDING DEPLOYMENT
Data Preparation
Model Building
Model Evaluation
TRANSITION TO FEEDBACK PHASE
Deployment
Monitoring
Feedback Collection
FEEDBACK PHASE
Error Analysis
Model Refinement
Retraining
Test
Assessing model performance
THANK YOU
CASE STUDY-FEEDBACK