0% found this document useful (0 votes)
25 views8 pages

Data Science Methodology

Data science methodology is a structured approach that combines statistical analysis, technological expertise, and domain knowledge to solve complex problems and make data-driven decisions. It consists of 10 stages: business understanding, analytic approach, data requirements, data collection, data understanding, data preparation, modeling, evaluation, deployment, and feedback, and is guided by key questions at each stage. The methodology emphasizes the importance of asking the right questions to effectively utilize data and avoid jumping directly to solutions.

Uploaded by

vinaysikarwar199
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views8 pages

Data Science Methodology

Data science methodology is a structured approach that combines statistical analysis, technological expertise, and domain knowledge to solve complex problems and make data-driven decisions. It consists of 10 stages: business understanding, analytic approach, data requirements, data collection, data understanding, data preparation, modeling, evaluation, deployment, and feedback, and is guided by key questions at each stage. The methodology emphasizes the importance of asking the right questions to effectively utilize data and avoid jumping directly to solutions.

Uploaded by

vinaysikarwar199
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Data Science methodology:

Data science is an influential domain


that combined statistical analysis,
technological expertise, and domain knowledge
to extract valuable insights from extensive data sets.
However, despite the recent increase in
computing power and easier access to data,
we often don't understand
the questions being asked or know
how to apply the data correctly
to address the problem at hand.
Using a methodology helps resolve those issues.
What is a methodology?
A methodology is a system of methods
used in a particular area of study.
A methodology is a guideline for the decisions
researchers must make during the scientific process.
In the context of data science,
data science methodology is
a structured approach that guides
data scientists and solving
complex problems and making data-driven decisions.
Data science methodology also includes
data collection forms, measurement strategies,
and comparisons of data analysis methods
relative to different research goals and situations.
Using a methodology provides
the practical guidance needed to
conduct scientific research efficiently.
There's often a temptation to bypass
methodology and jump directly to solutions.
However, jumping to solutions hinders
our best intentions for solving problems.
Next, let's explore
methodology as it relates to data science.
The data science methodology discussed in
this course is a methodology outlined by John Rollins,
a seasoned IBM Senior Data Scientist.
This course is built on his professional
experience and insights into
the importance of following a methodology
for successful data science outcomes.
As a general methodology,
data science methodology consists
of the following 10 stages.
Business understanding, analytic approach,
data requirements, data collection,
data understanding, data preparation,
modeling, evaluation, deployment, and feedback.
Asking questions is the cornerstone
of success in data science.
Questions drive every stage of data science methodology.
Data science methodology aims to answer
the following 10 basic questions which
align with the data methodology questions.
These first two questions help you define
the issue and determine what approach to use.
You'll ask, what is
1. the problem that you're trying to solve?
2. How can you use data to answer the question?
You'll use the next four questions to
help you get organized around the data.
3. You'll ask, what data do you need to answer the question,
4. where's the data source from,
and how will you receive the data?
5. Does the data you collect
represent the problem to be solved and
6. what additional work is required to
manipulate and work with the data?
Then you'll use these final four questions to validate
your approach and final design for ongoing analysis.
7. You'll ask, when you apply data visualizations,
do you see answers that address the business problem?
8. Does the data model answer
the initial business question
or must you adjust the data?
9. Can you put the model into practice?
10. Can you get constructive feedback from the data
and the stakeholder to answer the business question?
In this video, you learned that data science methodology
guides data scientists in
solving complex problems with data.
A methodology also includes
data collection forms, measurement strategies,
and comparisons of data analysis methods
relative to different research goals and situations.
As a general science methodology,
data methodology consists of the following 10 stages.
Business understanding, analytic approach,
data requirements, data collection,
data understanding, data preparation,
modeling, evaluation, deployment, and feedback.
The 10 questions aligned with
defining the business issue,
determining an approach, organizing your data,
and validating your approach for the final data design. Welcome to data science methodology
overview. After watching this video, you'll be able to describe the term methodology, relate
methodology to data science and John Rollins's contributions to data methodology, identify the 10
stages of standard data methodology and categorize the questions for 10 stages of standard data
methodology. Data science is an influential domain that combined statistical analysis, technological
expertise, and domain knowledge to extract valuable insights from extensive data sets. However,
despite the recent increase in computing power and easier access to data, we often don't
understand the questions being asked or know how to apply the data correctly to address the
problem at hand. Using a methodology helps resolve those issues. What is a methodology? A
methodology is a system of methods used in a particular area of study. A methodology is a guideline
for the decisions researchers must make during the scientific process. In the context of data science,
data science methodology is a structured approach that guides data scientists and solving complex
problems and making data-driven decisions. Data science methodology also includes data collection
forms, measurement strategies, and comparisons of data analysis methods relative to different
research goals and situations. Using a methodology provides the practical guidance needed to
conduct scientific research efficiently. There's often a temptation to bypass methodology and jump
directly to solutions. However, jumping to solutions hinders our best intentions for solving problems.
Next, let's explore methodology as it relates to data science. The data science methodology
discussed in this course is a methodology outlined by John Rollins, a seasoned IBM Senior Data
Scientist. This course is built on his professional experience and insights into the importance of
following a methodology for successful data science outcomes. As a general methodology, data
science methodology consists of the following 10 stages. Business understanding, analytic
approach, data requirements, data collection, data understanding, data preparation, modeling,
evaluation, deployment, and feedback. Asking questions is the cornerstone of success in data
science. Questions drive every stage of data science methodology. Data science methodology aims
to answer the following 10 basic questions which align with the data methodology questions. These
first two questions help you define the issue and determine what approach to use. You'll ask, what is
the problem that you're trying to solve? How can you use data to answer the question? You'll use the
next four questions to help you get organized around the data. You'll ask, what data do you need to
answer the question, where's the data source from, and how will you receive the data? Does the
data you collect represent the problem to be solved and what additional work is required to
manipulate and work with the data? Then you'll use these final four questions to validate your
approach and final design for ongoing analysis. You'll ask, when you apply data visualizations, do
you see answers that address the business problem? Does the data model answer the initial
business question or must you adjust the data? Can you put the model into practice? Can you get
constructive feedback from the data and the stakeholder to answer the business question? In this
video, you learned that data science methodology guides data scientists in solving complex
problems with data. A methodology also includes data collection forms, measurement strategies,
and comparisons of data analysis methods relative to different research goals and situations. As a
general science methodology, data methodology consists of the following 10 stages. Business
understanding, analytic approach, data requirements, data collection, data understanding, data
preparation, modeling, evaluation, deployment, and feedback. The 10 questions aligned with
defining the business issue, determining an approach, organizing your data, and validating your
approach for the final data design. methodology and jump directly to solutions.

Welcome! This alphabetized glossary contains many of the terms you'll find within this
lesson. These terms are important for you to recognize when working in the industry,
when participating in user groups, and when participating in other certificate programs.

Term Definition
Analytic The process of selecting the appropriate method or path to
Approach address a specific data science question or problem.
Analytics The systematic analysis of data using statistical, mathematical,
and computational techniques to uncover insights, patterns, and
trends.
Business The initial phase of data science methodology involves seeking
Understandin clarification and understanding the goals, objectives, and
g requirements of a given task or problem.
Clustering An approach used to learn about human behavior and identify
Association patterns and associations in data.
Cohort A group of individuals who share a common characteristic or
experience is studied or analyzed as a unit.
Cohort study An observational study where a group of individuals with a
specific characteristic or exposure is followed over time to
determine the incidence of outcomes or the relationship between
exposures and outcomes.
Congestive A chronic condition in which the heart cannot pump enough
Heart Failure blood to meet the body's needs, resulting in fluid buildup and
(CHF) symptoms such as shortness of breath and fatigue.
CRISP-DM Cross-Industry Standard Process for Data Mining is a widely used
methodology for data mining and analytics projects
encompassing six phases: business understanding, data
understanding, data preparation, modeling, evaluation, and
deployment.
Data analysis The process of inspecting, cleaning, transforming, and modeling
data to discover useful information, draw conclusions, and
support decision-making.
Data The process of identifying and correcting or removing errors,
cleansing inconsistencies, or inaccuracies in a dataset to improve its
quality and reliability
Data science An interdisciplinary field that combines scientific methods,
processes, algorithms, and systems to extract knowledge and
insights from structured and unstructured data.
Data science A structured approach to solving business problems using data
methodology analysis and data-driven insights.
Data A professional using scientific methods, algorithms, and tools to
scientist analyze data, extract insights, and develop models or solutions
to complex business problems.
Data Professionals with data science and analytics expertise who
scientists apply their skills to solve business problems.
Data-Driven Insights derived from analyzing and interpreting data to inform
Insights decision-making
Decision tree A supervised machine learning algorithm that uses a tree-like
structure of decisions and their possible consequences to make
predictions or classify instances.
Decision Tree A model that uses a tree-like structure to classify data based on
Classification conditions and thresholds provides predicted outcomes and
Model associated probabilities.
Decision Tree A classification model that uses a decision tree to determine
Classifier outcomes based on specific conditions and thresholds.
Decision-Tree A model used to review scenarios and identify relationships in
Model data, such as the reasons for patient readmissions
Descriptive An approach used to show relationships and identify clusters of
approach similar activities based on events and preferences
Descriptive Modeling technique that focuses on describing and summarizing
modeling data, often through statistical analysis and visualization, without
making predictions or inferences
Domain Expertise and understanding of a specific subject area or field,
knowledge including its concepts, principles, and relevant data
Goals and The sought-after outcomes and specific objectives that support
objectives the overall goal of the task or problem.
Iteration A single cycle or repetition of a process often involves refining or
modifying a solution based on feedback or new information.
Iterative A process that involves repeating a series of steps or actions to
process refine and improve a solution or analysis. Each iteration builds
upon the previous one.
Leaf The final nodes of a decision tree where data is categorized into
specific outcomes.
Machine A field of study that enables computers to learn from data
Learning without being explicitly programmed, identifying hidden
relationships and trends.
Mean The average value of a set of numbers is calculated by summing
all the values and dividing by the total number of values.
Median When arranged in ascending or descending order, the middle
value in a set of numbers divides the data into two equal halves.
Model A simplified representation or abstraction of a real-world system
(Conceptual or phenomenon used to understand, analyze, or predict its
model) behavior.
Model The process of developing predictive models to gain insights and
building make informed decisions based on data analysis.
Pairwise A statistical technique that measures the strength and direction
comparison of the linear relationship between two variables by calculating a
(correlation) correlation coefficient.
Pattern A recurring or noticeable arrangement or sequence in data can
provide insights or be used for prediction or classification.
Predictive A model used to determine probabilities of an action or outcome
model based on historical data.
Predictors Variables or features in a model that are used to predict or
explain the outcome variable or target variable.
Prioritization The process of organizing objectives and tasks based on their
importance and impact on the overall goal.
Problem The process of addressing challenges and finding solutions to
solving achieve desired outcomes.
Stakeholders Individuals or groups with a vested interest in the data science
model's outcome and its practical application, such as solution
owners, marketing, application developers, and IT administration.
Standard A measure of the dispersion or variability of a set of values from
deviation their mean; It provides information about the spread or
distribution of the data.
Statistical Stand deviations are applied to problems that require counts,
analysis such as yes/no answers or classification tasks.
Statistics The collection, analysis, interpretation, presentation, and
organization of data to understand patterns, relationships, and
variability in the data.
Structured Data organized and formatted according to a predefined schema
data (data or model and is typically stored in databases or spreadsheets.
model)
Text analysis The process of extracting useful information or knowledge from
data mining unstructured textual data through techniques such as natural
language processing, text mining, and sentiment analysis.
Threshold The specific value used to split data into groups or categories in
value a decision tree.

You might also like