Module 5 - Data Science Methodologies
Module 5 - Data Science Methodologies
Module 5 - Data Science Methodologies
MODULE 1: MULTIMEDIA
Data science
Learning Competencies
G11: 41
MODULE 5:
MODULE 1: MULTIMEDIA
Data science is an enormous field, and it is not only about developing machine
learning models or predicting outputs to various scenarios an individual can experience
when dealing with data. A data scientist wears different hats and might be responsible for
one or more of the following; Business understanding, Data understanding, Data
preparation, Modeling, Evaluation, and Deployment
Each of these tasks is linked to each other, and they help other roles within the data
science methodology. Data Science Methodology indicates the routine for finding solutions
to a specific problem. This is a cyclic process that undergoes a critic behaviour guiding
business analysts and data scientists to act accordingly.
G11: 42
MODULE 5:
MODULE 1: MULTIMEDIA
1. Business understanding
For example, if a business owner asks, “How can we lower the cost of an activity?” We need
to understand if the goal is to improve the efficiency of the activity. Or should the
profitability of companies be increased? Once the goal is clear, the next piece of the puzzle
determines the goals that support it. The breakdown of objectives can lead to structured
discussions that set priorities that can help to organize and plan how to deal with the problem.
Depending on the problem, different stakeholders should participate in the discussion to
identify the requirements and clarify the problems.
2. Analytic approach
G11: 43
MODULE 5:
MODULE 1: MULTIMEDIA
3. Data requirements
G11: 44
MODULE 5:
MODULE 1: MULTIMEDIA
4. Data collection
G11: 45
MODULE 5:
MODULE 1: MULTIMEDIA
5. Data understanding
6. Data preparation
7. Model Training
G11: 46
MODULE 5:
MODULE 1: MULTIMEDIA
8. Model Evaluation
9. Deployment
10. Feedback
G11: 47
MODULE 5:
MODULE 1: MULTIMEDIA
I. True/False:
Direction: Read the statements carefully. Write True if the statement is correct, otherwise,
False.
1. Data requirements involves identifying the content, formats, and data sources needed for
the initial data collection.
2. Descriptive statistics and visualization techniques can help a data scientist understand the
content of the data, assess its quality, and obtain initial information about the data.
3. Once a business problem has been clearly identified, the Data Scientist can define the data
collection.
4. Statistical analysis refers to problems that require accounts.
5. Descriptive statistics and visualization techniques can help a data scientist understand the
content of the data, assess its quality, and obtain initial information about the data.
9. At this stage, the data requirements are reviewed and a decision is made as to whether more
or less data is required for the collection.
A. Data collection
B. Data preparation
C. Model evaluation
D. Analytic approach
G11: 48
MODULE 5:
MODULE 1: MULTIMEDIA
10. At this stage, all the activities used to create the data set used during the modeling phase
include cleansing data, combining data from multiple sources, and transforming data into
more useful variables.
A. Data preparation
B. Data collection
C. Model evaluation
D. Data requirements
G11: 49