0% found this document useful (0 votes)
13 views

Module 6 - Data Science Methodology (Steps)

Steps involved in tackling a data science problem and applying them to interesting real-world examples.

Uploaded by

Michael Manalo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Module 6 - Data Science Methodology (Steps)

Steps involved in tackling a data science problem and applying them to interesting real-world examples.

Uploaded by

Michael Manalo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

MODULE 6:

Data science

Learning Competencies
6.1. Apply methodology in different types of data science problems.
6.2. Compare the stages of data science methodology from understanding to
preparation, modeling to evaluation, and deployment to feedback.
6.3. Explain what happens when a model is deployed and why model feedback is
important.

G11: 50
MODULE 6:

The fundamentals of the data science methodology have been presented on the
previous lesson. Take note that the stages of this methodology are iterative. These steps have
included: forming a concrete business or research problem, collecting and analyzing data,
and building a model; and understanding the feedback after model deployment.

To further deepen the idea of the whole process, on this lesson you’ll learn how to
think like a data scientist, including taking the steps involved in tackling a data science
problem and applying them to interesting real-world examples.

Toward Data Science Methodology


Despite the increased computing power and access to data in recent decades, the ability to
use data in the decision-making process is lost or not maximized too often. There is no solid
understanding of questions that are asked and how the data is correctly applied to the problem
in question. That’s why methodology come inti the picture to design any problem.

The definition of the word methodology, as given by the Business Dictionary, is a system of
broad principles or rules from which specific methods or procedures may be derived to
interpret or solve. It is necessary to keep this in mind since temptation is often great to
circumvent the methodology and go directly to the solutions.

Data Science Methodology

G11: 51
MODULE 6:

Data Science Methodology Outline


The Data Science Methodology aims to answer the following 10 questions in this prescribed
sequence:

From problem to approach:


1. What is the problem that you are trying to solve?
2. How can you use the data to answer the question?

From requirements to collection:


3. What data do you need to answer the question?
4. Where is the data coming from (identify all sources) and how will
you get it?
From understanding to preparation:
5. Is the data that you collected representative of the roble to be
solved?
6. What additional work is required to manipulate and work with
the data?

From Modeling to evaluation:


7. In what way can the data be visualized to get the answer that is
required?
8. Does the model used to answer the initial question, or does it need
to be adjusted?
From deployment to feedback:
9. Can you put the model into practice?
10. Can you get constructive feedback into answering the question?

Data Methodology in Practice


To better understand the application of each step on the methodology and the questions each
entails, let us view each with the use of a case study.

1. Business Understanding
Problem: How to best divide the limited Applying the concepts
health budget into optimal use to provide
quality care?
As public funds for readmission declined,
the insurance company ran the risk of
offsetting the difference in costs, which
could lead to higher costs for its clients.

G11: 52
MODULE 6:

First thing to do is defining the problem being faced. On this case, we have insurance
company’s insurance rates in review.

Objectives Knowing that higher insurance rates would


not be popular, the insurance company
contacted local health authorities and hired
Data Science Expert to learn how data
science could be applied on the question.
Before we could start collecting data, we had
to define the objectives. After spending time
setting goals, the team prioritized “patient
readmission” as an effective area for review.
Taking into account the objectives, it Examining Hospital Readmissions
was found that approximately 30% of
those who completed the
rehabilitation treatment would be
reintegrated into a rehabilitation
center within one year. and that 50%
would be resumed within five years.
After reviewing some records, it was
found that patients with heart failure
were high on the list of readmissions.
It has also been found that a decision tree model can be applied to investigate this scenario
to determine the reason for this phenomenon. To gain the business insight that will assist the
analysis team in formulating and implementing their first project, Data scientists proposed
and organized a workshop on-site.
The involvement of key commercial sponsors throughout the project has been essential as a
sponsor: setting the overall direction. He remained committed and advised. If necessary, he
got the necessary support. Finally, four business requirements were identified for each model
built: predict readmission results for patients with heart failure, predict the risk of
readmission, understand the combination of events that led to the expected result, and apply
a process that is easy for new patients to understand because of their risk of readmission.

2. Analytic Approach
In the case where the question about human behavior is as ked, it would be an appropriate
response to use clustering approaches. Let us now examine the case study on the application
of the analytical approach. For the case study, a decision tree classification model was used
to identify the combination of conditions that resulted in the results of each patient.

G11: 53
MODULE 6:

In this approach, examining the variables Decision Tree Classification Selected


in each of the nodes along each path of a
leaf resulted in a corresponding
threshold. This means that the decision
tree classifier returns both the expected
result and the probability of that result,
based on the proportion of the dominant
result, yes or no, in each group. From
this information, analysts can derive the risk of readmission or the probability of a yes for
each patient.
If the dominant result is yes, the risk is simply the proportion of patients with yes on the
sheet. Otherwise, the risk is 1 minus the proportion of a patient on the leaf. A decision
tree classification model is easy to understand and apply to non-data scientists to assess the
risk of readmitting new patients.
Doctors can easily identify under which conditions a patient is considered to be at risk, and
during hospitalization, multiple models can be designed and used at different times.
This provides a moving picture of the patient’s risk and its evolution in the various treatments
used. For these reasons, the decision tree classification approach was chosen to create the
cardiac failure readmission model.

3. Data Requirement Selecting the cohort


In the case study, the first task was to
define the data required for the
classification approach of the
selected decision tree. This involved
selecting a suitable cohort of patients
from the members of the health
insurance companies.
In order to compile the complete medical records, three criteria were identified that should
be included in the cohort.
• First, a patient had to be hospitalized in the service area of the provider to gain access
to the required information.
• Second, for one year, they focused on patients with a primary diagnosis of heart failure.
• Third, a patient must have had a continuous record of at least six months prior to initial
heart failure for a complete medical history.
Patients with congestive heart failure who have been diagnosed with other serious conditions
have been excluded from the cohort, as this may result in above-average rates of re-entry and
may therefore distort results.

G11: 54
MODULE 6:

Defining the Data

• Then he defined the content, format and representations of the data needed to classify
the decision tree.
• This modeling technique requires one registration per patient, with columns
representing the variables of the model. To model readmission results, data covering
all aspects of the patient’s medical history should be available.
• This content includes authorizations, primary, secondary and tertiary diagnoses,
procedures, prescriptions and other services provided during hospitalization or visits
by patients / doctors.
In this way, a given patient can have thousands of records that represent all their attributes.
To obtain a record by patient format, the data analysis specialists collected the transaction
records from patient records and created a set of new variables to represent that information.
It was a task for the data preparation phase, so it is important to anticipate the next phases.

4. Data Collection Gathering the available data


In our case study, this information
may include demographic, clinical
and patient care information,
provider information, claims records,
as well as pharmaceutical and other
information related to all heart failure
diagnoses.

• This case study also required specific information on drugs, but this data source was
not yet integrated with the rest of the data sources.
• This brings us to an important point: it is correct to postpone decisions about
unavailable data and to try to capture them later.
• For example, this can happen even after obtaining intermediate results from predictive
modeling. If these results indicate that drug information may be important for a good
model, you will spend time trying to get it.
However, it turned out that they could build a reasonably good model without this
information about drugs.

G11: 55
MODULE 6:

Merging the data

• Database administrators and programmers often work together to extract data from
different sources and then combine them.
• In this way, the redundant data can be deleted and made available to the next level of
methodology, namely the understanding of the data.
• At this stage, scientists and analysis team members can discuss ways to better manage
their data by automating certain database processes to facilitate data collection.

5. Data Understanding Understanding the data

Apply the understanding of our


methodological data to the case study
we are studying. To understand data
on the onset of heart failure,
descriptive statistics had to be
established in the data columns that
would become variables in the model.
• First, these statistics included Hearst, Uni-variate, and statistics for each variable,
such as mean, median, minimum, maximum, and standard deviation.
• Second, pairwise correlations have been used to determine the degree of correlation
between the linked variables and those that, if any, are highly correlated, meaning
that they are essentially redundant, making it only relevant for the modeling.
• Third, the histograms of the variables were examined to understand their
distributions. Histograms are a good way to understand how values or variables are
distributed and what kind of data preparation may be needed to make the variable
more useful in a model. For example, if a categorical variable contains too many
different values to be meaningful in a model, the histogram can help decide how to
consolidate those values.
• Univariate, statistics and histograms are also used to assess the quality of the data.
On the basis of the data provided, some values can be recorded or deleted, if necessary, e.g.
For example, if a particular variable has a lot of missing values.

G11: 56
MODULE 6:

• The question then arises as to whether “missing” means something. Sometimes a


missing value means “no” or “0” (zero), or sometimes simply “we do not know”.
• Or if a variable contains invalid or misleading values; For example, a numeric variable
called “age” containing 0 to 100 and 999, where “triple-9” actually means “missing”,
will be treated as a valid value unless we have corrected it.
Looking at the data quality

• First, the importance of heart failure was determined based on a primary diagnosis of
heart failure. However, the data comprehension study revealed that the initial definition
did not cover all expected cases of heart failure due to clinical experience.
• This involved returning to the data collection phase, adding secondary and tertiary
diagnoses, and creating a more complete definition of heart failure approval.
• This is just an example of the interactive processes in the methodology. The more you
work with the problem and the data, the more you learn and the more the model can be
adjusted, which ultimately leads to a better resolution of the problem.

6. Data Preparation
Understanding the data
An important first step in the data
preparation stage was to actually define
congestive heart failure. This sounded
easy at first but defining it precisely, was
not straightforward.
• First, the set of diagnosis-related
group codes needed to be
identified, as congestive heart
failure implies certain kinds of
fluid buildup.
• We also need to consider that
congestive heart failure is only Defining Readmission
one type of heart failure. Clinical
guidance was needed to get the
right codes for congestive heart
failure.
• The next step involved defining
the re-admission criteria for the
same condition. The timing of
events needed to be evaluated in
order to define whether a

G11: 57
MODULE 6:

particular congestive heart failure admission was an initial event, which is called an
index admission, or a congestive heart failure-related re-admission.
Based on clinical expertise, a time
Defining CHF admission
period of 30 days was set as the
window for readmission relevant for
congestive heart failure patients,
following the discharge from the initial
admission.
• Next, the records that were in
transactional format were
aggregated, meaning that the
data included multiple records
for each patient.
Transactional records included professional provider facility Aggregating Records
claims submitted for physician, laboratory, hospital, and
clinical services. Also included were records describing all the
diagnoses, procedures, prescriptions, and other information
about in-patients and out-patients. A given patient could easily
have hundreds or even thousands of these records, depending
on their clinical history.
• Then, all history were aggregated to the patient level,
yielding a single record for each patient, as required
for the decision-tree classification method that would
be used for modeling.
As part of the aggregation process, many new columns were
created representing the information in the transactions. For
example, frequency and most recent visits to doctors, clinics
and hospitals with diagnoses, procedures, prescriptions, and so forth-morbidities with
congestive heart failure were also considered, such as diabetes, hypertension, and many other
diseases and chronic conditions that could impact the risk of re-admission for congestive heart
failure.
During discussions around data
More or less data needed? preparation, a literary review on
congestive heart failure was also
undertaken to see whether any
important data elements were
overlooked, such as co-morbidities
that had not yet been accounted for.
The literary review involved looping
back to the data collection stage to
add a few more indicators for
conditions and procedures.

G11: 58
MODULE 6:

Aggregating the transactional data at Completing the data set


the patient level, meant merging it
with the other patient data, including
their demographic information, such
as age, gender, type of insurance, and
so forth. The result was the creation
of one table containing a single record
per patient, with many columns
representing the attributes about the
patient in his or her clinical history. These columns would be used as variables in the predictive
modeling.
Here is a list of the variables that were ultimately used in
building the model. The dependent variable, or target, was
congestive heart failure readmission within 30 days following
discharge from a hospitalization for congestive heart failure,
with an outcome of either yes or no. The data preparation stage
resulted in a cohort of 2,343 patients meeting all of the criteria
for this case study. The cohort was then split into training and
testing sets for building and validating the model, respectively.

7. Data Modeling
In this step, many aspects of model construction will be discussed. One thing is optimizing the
parameters to improve the model.
Analyzing the 1st model

• With a set of prepared training data, it is possible to construct the first classification
model of the decision tree for congestive readmission for heart failure. We are looking
for patients with high-risk
readmission. The result that will
interest us will be a congestive
readmission for heart failure
equivalent to “yes”. In this first
model, the overall accuracy of the
classification of the results was 85%
and not 85%. It sounds good, but
represents only 45% of the “yes”.
Actual readmission are ranked
correctly, which means that the
model is not very accurate.

G11: 59
MODULE 6:

• The question is: how to improve the accuracy of the model to predict the outcome itself?
For the classification of the decision tree, the best parameter to adjust is the relative
cost of the results yes and not classified incorrectly.
• Think of it this way: When a true non-readmission is misclassified and actions are
taken to reduce the risk of this patient, the cost of this error is a wasted intervention.
• A statistician calls this a Type I error or a false positive. But when a real readmission
is misclassified and no action is taken to reduce this risk, the cost of such an error
is readmission and all associated costs, as well as trauma to the patient.
• It’s a Type II error or a false negative. Then we can see that the costs of the two
different types of incorrect classification errors can be very different. For this reason,
it is reasonable to adjust the relative weights of the incorrect classification of the results
yes and no.
• The default is between 1 and 1, but the decision tree algorithm allows you to set a
higher value for yourself.
Analyzing the 2nd model

• For the second model, the relative cost was set at 9/1. This report is very high, but
provides more information about the behavior of the model. This time, the 97% model
worked well, but at a very low cost, with a general accuracy of only
49%. Obviously, this is not a good model.
• The problem with this result is the large number of false positives, suggesting
unnecessary and costly interventions for patients that have never been re-admitted.
• Therefore, the data scientist must try again to get a better balance between the yes and
no data.
Analyzing the 3rd model

G11: 60
MODULE 6:

• For the third model, the relative cost was set to a more reasonable 4: 1 ratio. This
time, 68% was obtained yes, but statistician called it sensitivity, and 85% accuracy for
the no, called specificity., with an overall accuracy of 81%.
• This is the best balance that can be achieved with a relatively limited training set of
workouts by adjusting the relative cost of the misclassified yes and no result
parameters. Of course, modeling requires much more work, including an iteration in
the data preparation phase, to redefine some of the other variables to better represent
the underlying information and thus improve the model.
8. Model Evaluation
In this part of the case study, the evaluation of component in the data science methodology will
be applied.
Determining the optimal model

• Look for a way to find the optimal model through a diagnostic measurement based on
the configuration of one of the model’s construction parameters. Examine more closely
how the relative costs of misclassifying positive and negative results can be adjusted.
Four models were constructed with four different relative misclassification costs.
• Each value of this model construction parameter increases the true positive rate, or
the sensitivity, of the accuracy in the prediction yes, to the detriment of a lower
accuracy in the prediction no. that is, an increasing rate of false positives.
• The question is, which model is best based on setting this parameter? For budgetary
reasons, the risk reduction intervention could not be applied to most patients with heart
failure, many of whom would not have been readmitted anyway.
• On the other hand, the intervention would not be as effective as it should be to improve
patient care, since the number of patients with high-risk heart failure was not enough.
• So how do we determine which model was optimal? Optimal model is the one that
provides the maximum separation between the blue ROC curve and the red baseline.
• We can see that model 3, with a relative cost of misclassification of 4 to 1, is the best
of the 4 models. And if asked, ROC represents the characteristic operating curve of the
receiver, which was first developed during World War II to detect enemy aircraft
on a radar.

G11: 61
MODULE 6:

• Since then, it has also been used in many


other areas. Today, it is commonly used in
machine learning and data mining. The ROC
curve is a useful diagnostic tool to determine
the optimal classification model.
• This curve quantifies the performance of
a binary classification model, declassifying
the results yes and no when a discrimination
criterion is changed.
• In this case, the criterion is a relative cost of
misclassification. By plotting the true positive rate against the false positive rate for
different values of the relative cost of misclassification, the ROC curve facilitated the
selection of the optimal model.
9. Deployment
Once the model has been evaluated and the Data scientist is convinced that it will work, it will
be used and subjected to the final test.
Understand the result
• “In preparation, to provide the
solution, the next step was
to gather the knowledge of
the stakeholder group
responsible for the design and
management of the intervention
program to reduce the risk of
readmission.
• Entrepreneurs have translated the results of the model so that clinical staff can
understand how to identify high-risk patients and design appropriate interventions.
• The objective was to reduce the risk of readmission of these patients within 30 days of
discharge. During the operational requirements phase, the intervention program
director and her team looked for an application that could assess the risk of heart failure
almost automatically in real time.
Gathering application and additional requirements
• It should also be easy for clinical
staff to use, preferably through
a browser and tablet-based
application that any employee could
carry with them. These patient data
were generated throughout the
hospital stay. It will be generated
automatically in a format required
by the model and each patient will be
noticed shortly before discharge.

G11: 62
MODULE 6:

• Then, doctors would have the most up-to-date risk assessment for each patient to help
them choose which patients to treat after discharge. As part of providing
the solution, the intervention team would develop and offer training to clinical staff.
• In addition, in collaboration with IT developers and database administrators,
monitoring and monitoring processes should be developed for patients receiving the
intervention, so that the results can go through the feedback phase and the model can
be mature over time.
• This Map is an example of a Hospitalization risk for Juvenile Diabetes Patients
solution implemented through
a Cognos application (IBM
Cognos Business Intelligence is
a web-based integrated
business intelligence suite by
IBM). In this case, the case
study focused on the risk of
hospitalization of patients with
juvenile diabetes. Similar to congestive heart failure, he used the classification of the
decision tree to create a risk model that would form the basis of this application.
• The map provides an
Risk summary report by decision tree model note
overview of hospital risk
nationwide, with a planned
interactive risk assessment for
different patient conditions and
other characteristics. This above
image provides an interactive
summary report on the risk per
patient population in a given
node of the model so that doctors can understand the combination of conditions for that
subset of patients.
Individual patient risk report
• This report provides a detailed
summary of a single patient,
including details of the patient’s
history and expected risk, and
provides the doctor with a brief
summary.

G11: 63
MODULE 6:

10. Feedback
Once the model has been evaluated and the data scientist trusts that it will work, it will be
implemented and will undergo the final test: its real use in real time in the field.
• The feedback phase plan included the following Assessing model performance
steps: First, the review process would be
defined and established, with the overall
responsibility of measuring the results of a flight
risk model of the heart failure risk population.
Clinical management has overall responsibility
for the review process.
• Second, patients with heart failure who receive
an intervention would be monitored and their
readmission results recorded.
• Third, the intervention would be measured to
determine its effectiveness in reducing the
number of readmissions.
• For ethical reasons, patients with heart failure would not be divided into controlled
groups and treatment groups. Readmission rates are compared before and after the
implementation of the model to measure the impact.
• After deployment and feedback, the impact Assessing model performance
of the intervention program on readmission
rates will be reviewed after the first year of
implementation.
• Then, the model would be refined based on
all data compiled after the implementation of
the model and the knowledge acquired in
these steps. Other improvements include
the inclusion of information on
participation in the intervention program and
possibly the refinement of the detailed Redeployment
pharmaceutical data model.
• Data collection was initially delayed because
drug data was not available at that time.
However, after feedback and practical experience
of the model, it can be said that adding this data
can be worth the investment of time and money.
The possibility of new adjustments during the
feedback phase must also be considered.

G11: 64
MODULE 6:

• In addition, response actions and processes are reviewed and probably refined
according to the experience and knowledge acquired during the initial implementation
and feedback.
• Finally, the refined model and intervention would be redeployed, and the feedback
process would continue throughout the intervention program.

Multiple choice. Analyze the questions carefully. Coose the letter of the correct answer.
1. Select the correct statement that describes data science methodology.
A. Data science methodology is not an iterative process – one does not go back and forth
between methodological steps.
B. Data science methodology is a specific strategy that guides processes and activities
relating to data science only for text analytics.
C. Data science methodology depends on a specific set of technologies or tools.
D. Data science methodology provides data scientists with a framework for how to proceed
to obtain answers.
2. What do data scientist usually use for exploratory analysis of data and to get acquainted
with them?
A. They use support vector machines and neural networks as feattureextraction techniques.
B. They begin with regression, classification, or clustering.
C. They use descriptive statistics and data visualization techniques.
D. They use deep learning.
3. Why should data scientists maintain continuous communication with business sponsors
throughout a project?
A. So that business sponsors can provide domain expertise.
B. So that business sponsors can ensure the work remains on track to generate the intended
solution
C. So that business sponsors can review intermediate findings.
D. All of the above.
4. For predictive models, a test set, which is similar to – but independent of – the training set,
is used to determine how well the model predicts outsomes. This is an example of what
step in the methodology?
A. Deployment
B. Data preparation
C. Model evaluation
D. Analytic approach
5. Data understanding involves all of the following EXCEPT for?
A. Discovering initial insights about the data
B. Visualizing the data
C. Assessing data quality
D. Gathering and analyzing feedback for assessment of the model’s performance

G11: 65
MODULE 6:

6. The following are all examples of rapidly evolving technologies that affect data science
methodology EXCEPT for?
A. Data Sampling
B. Text Analysis
C. Platform Growth
D. In-database analytics
7. Data scientists may use either a “top-down” or a “bottom-up” approach to data science.
These two approaches refer to:
A. Top-down approach – the data, when sorted, is modeled for the “top” of the data towards
the “bottom”. Bottom-up approach – the data is modeled from the “bottom” of the data
to the “top”.
B. Top-down approach – models are fit before the data is explored. Bottom-up approach –
the data is explored, and then a model is fit.
C. Top-down approach – first defining, a business problem then analyzing the data to find a
solution. Bottom-up approach – starting with the data, and then coming up with a business
problem based on the data.
D. Top-down approach – using massively parallel, warehouses with huge data volumes as
data source. Bottom-up approach – using a sample of small data before using large data.
8. A car company asked a data scientist to determine what type of customers are more likely
to purchase their vehicles. However, the data comes from several sources and is in a
relatively “raw format”. What kind of processing can the data scientist perform on the data
to prepare it for modeling?
A. Feature Engineering
B. Transforming the data into more useful variables
C. Addressing missing/invalid values
D. All of the above
9. A data scientist, John, was asked to help reduce readmission rates at a local hospital. After
some time, John provided a model that predicted which patients were more likely to be
readmitted to the hospital and declared that his work was done. Which of the following best
describes the scenario?
A. John only provided one model as a solution and he should have provided multpile
models.
B. The scenario is already optimal.
C. Even though John only submitted one solution, it might be a good one. However, John
needed feedback on his model from the hospital to comfirm that his model was able to
address the problem appropriately and sufficiently.
D. John still need to collect more data.
10. Data scientists may frequently return to a previous stage to make adjustments, as they learn
more about the data and the modeling.
A. True
B. False

G11: 66
MODULE 6:

A case study helps students learn by immersing them in a real-world business scenario where they can
act as problem-solvers and decision-makers. The case presents facts about a particular organization.
Analysis are done by focusing on the most important facts and using this information to determine the
opportunities and problems facing that organization.Then, alternative courses of action to deal with the
problems will be identified.

To be more familiar with the task, you may visit this link:
● Klaudon, Kenneth C. , (2021, November). Essentials of Management Information
Systems Sixth Edition https://tinyurl.com/63t6sr49

In your Virtual Expo Entry No. 5, you will look for research, examples, and case studies to which you
can apply the data science methodology. Disect your chosen case study an identify which specific part
relates to the different steps of the methodology.
The content and presentation of your analysis on your group’s website will be graded as your
Performance Task. You may convey it through creative graphics and illustrations. Navigation will also
be graded.
Refer on this rubric in grading your output:
VIRTUAL EXPO RUBRIC
Partially
Exemplary Proficient Incomplete
Proficient
Content (15) (12) (9) (5)
The content is rich, Content is There is adequate There is
concise, and complete and detail. Some insufficient
straightforward. includes relevant extraneous detail, or detail is
The content is detail. information and irrelevant and
relevant to the minor gaps are extraneous.
discussed topics included.
and thoroughly
answers the
questions.
Creativity/Visual (15) (12) (9) (5)
The expo is The expo is visually The main theme Lacks visual
visually sensible. The use of is still clarity. The
effective. graphics/images/ discernible, but graphics/images/
The use of photographs are use of photographs are
graphics/images/ included and graphics/images/ distracting
photographs appropriate. photographs are from the content of
seamlessly relate included but are the
well to the content. used randomly. expo.
Navigation (10) (8) (5) (2)
The document is Hyperlinks are Hyperlinks are There are few
fully hyperlinked. organized good but lacks links. Some links
The index is into logical groups. organization. are “broken”.
well organized and Not all
easy to possible features
navigate. have been
employed.

G11: 67
MODULE 6:

IBM Data Science Methodology. https://www.coursera.org/learn/data-science-methodology

Gajare, Shreyal 2019. Data Science Methodology and Approach -


https://www.geeksforgeeks.org/data-science-methodology-and-approach/

Logallo, Nunzio 2019. Data Science Methodology 101 How can a Data Scientist organize
his work? https://towardsdatascience.com/data-science-methodology-101-
ce9f0d660336

Patel, Ashish 2019. Data Science Methodology — How to design your data science project
https://medium.com/ml-research-lab/data-science-methodology-101-2fa9b7cf2ffe

Module Author/Curator : Ms. Myra Irene J. Catilo


Template & Layout Designer : Mrs. Jenny P. Macalalad

Multiple Choice

1. D
2. C
3. D
4. C
5. D
6. A
7. C
8. D
9. C
10. A

G11: 68

You might also like