0% found this document useful (0 votes)
63 views

Notes On Data Science Methodologies

The document discusses data science methodologies and their goal of answering questions through a multi-step process including defining the problem, collecting and preparing relevant data, developing and testing models, and deploying solutions. It outlines the typical phases and tasks in the CRISP-DM process for data mining projects, which involves business understanding, data understanding, data preparation, modeling, evaluation, and deployment in an iterative cycle.

Uploaded by

Reymon Dela Cruz
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Notes On Data Science Methodologies

The document discusses data science methodologies and their goal of answering questions through a multi-step process including defining the problem, collecting and preparing relevant data, developing and testing models, and deploying solutions. It outlines the typical phases and tasks in the CRISP-DM process for data mining projects, which involves business understanding, data understanding, data preparation, modeling, evaluation, and deployment in an iterative cycle.

Uploaded by

Reymon Dela Cruz
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Notes on data science methodologies

Data science methofologies seek to answer the following questions

From problem to apporach

1. What is the problem you are tyring to solve?


2. How can you use data to answer your question?

Working with data

1. What data do you need to answer the question?


2. Where is the data coming from an how will you get it?
3. Is the data you collected representative of the problem you are trying to solve?
4. What additional work is required to manipulate and work with the data?

Deriving answers

1. In what way can data be visualized in order to get to the answer that is required?
2. Does the model really answer the the initial question or does it need to be adjusted?
3. Can you put the model into practice?
4. Can you get constructive feedback into answering the questions?

1. Business Understanding This stage is the most important because this is where


the intention of the project is outlined. Foundational Methodology and CRISP-
DM are aligned here. It requires communication and clarity. The difficulty here is
that stakeholders have different objectives, biases, and modalities of relating
information. They don’t all see the same things or in the same manner. Without
clear, concise, and complete perspective of what the project goals are resources
will be needlessly expended.

2. Data Understanding Data understanding relies on business understanding. Data


is collected at this stage of the process. The understanding of what the business
wants and needs will determine what data is collected, from what sources, and by
what methods. CRISP-DM combines the stages of Data Requirements, Data
Collection, and Data Understanding from the Foundational Methodology outline.

3. Data Preparation Once the data has been collected, it must be transformed into
a useable subset unless it is determined that more data is needed. Once a dataset
is chosen, it must then be checked for questionable, missing, or ambiguous cases.
Data Preparation is common to CRISP-DM and Foundational Methodology.

4. Modeling Once prepared for use, the data must be expressed through whatever
appropriate models, give meaningful insights, and hopefully new knowledge. This
is the purpose of data mining: to create knowledge information that has meaning
and utility. The use of models reveals patterns and structures within the data that
provide insight into the features of interest. Models are selected on a portion of
the data and adjustments are made if necessary. Model selection is an art and
science. Both Foundational Methodology and CRISP-DM are required for the
subsequent stage.

5. Evaluation The selected model must be tested. This is usually done by having a


pre-selected test, set to run the trained model on. This will allow you to see the
effectiveness of the model on a set it sees as new. Results from this are used to
determine efficacy of the model and foreshadows its role in the next and final
stage.

6. Deployment In the deployment step, the model is used on new data outside of
the scope of the dataset and by new stakeholders. The new interactions at this
phase might reveal the new variables and needs for the dataset and model. These
new challenges could initiate revision of either business needs and actions, or the
model and data, or both.

CRISP-DM is a highly flexible and cyclical model. Flexibility is required at each step along
with communication to keep the project on track. At any of the six stages, it may be
necessary to revisit an earlier stage and make changes. The key point of this process is
that it’s cyclical; therefore, even at the finish you are having another business
understanding encounter to discuss the viability after deployment. The journey
continues.

CRISP-DM Help Overview


CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-
proven way to guide your data mining efforts.

 As a methodology, it includes descriptions of the typical phases of a project, the tasks


involved with each phase, and an explanation of the relationships between these tasks.
 As a process model, CRISP-DM provides an overview of the data mining life cycle.

Figure 1. The data mining life cycle


The life cycle model consists of six phases with arrows indicating the most important and
frequent dependencies between phases. The sequence of the phases is not strict. In fact, most
projects move back and forth between phases as necessary.

The CRISP-DM model is flexible and can be customized easily. For example, if your
organization aims to detect money laundering, it is likely that you will sift through large amounts
of data without a specific modeling goal. Instead of modeling, your work will focus on data
exploration and visualization to uncover suspicious patterns in financial data. CRISP-DM allows
you to create a data mining model that fits your particular needs.

In such a situation, the modeling, evaluation, and deployment phases might be less relevant than
the data understanding and preparation phases. However, it is still important to consider some of
the questions raised during these later phases for long-term planning and future data mining
goals.

You might also like