0% found this document useful (0 votes)
5 views5 pages

Best Methodologies

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views5 pages

Best Methodologies

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Best Methodologies/frameworks for Data Science 1

Best methodologies/frameworks for Data Science.

Gustavo Mendizabal

Universidad Católica del Norte

Ingenieria Civil Industrial

Nota del autor

Esto está escrito en inglés deliberadamente.


2

Best methologies/framework for Data Science

While looking for websites that has some information about the best methodologies for

Data Science. I came across a certain website called Data Science Process Alliance, which is

“A Community of Data & AI Practitioners”, that lists some of the Methodologies such as,

Waterfall, KDD, SEMMA, CRISP-DM, TDSP, DOMINO, and their descriptions, explained in

such a visual way that makes it so easy to understand. After thoroughly reading each one of

them. I chose two which are the best methodologies I would use. Keep in mind that they all

have their Strengths and Challenges, as well what they’re best for.

CRISP DM

Cross Industry Standard Process for Data Mining is a methodology created by five

different companies, Integral solutions Ltd, Teradata, Daimler AG, NCR Corporation and

OHRA in 1996.

It has six sequential phases, each of them answers a different scenario.

Business Understanding

What does the business need? This phase focuses on understanding the objectives and

requirements from a business perspective, then turn this knowledge into a data mining

problem.

Data Understanding

What data do we have and/or need? Is it clean? This phase focuses on understanding

the data, becoming familiar with it, in which create a hypothesis from it.

Data Preparation

How do we organize the data for modeling? Once the previous phase is over, it’s time

to begin the construction of a fine data. This preparation task most likely will be performed

multiple times
3

Modeling

What modeling techniques should we apply? In this case, it can be applied various

modeling techniques and methodologies to have the best model based on the fine data from the

previous phases. This is usually the best part of this methodology and often the shortest one.

Evaluation

Which model best meets the business objectives? At this stage, it’s important to

evaluate whether this model is the best suited for the objective from a business perspective and

reviewed. The decision of whether it’s acceptable or not must be reached at this point.

Deployment

How do stakeholders access the results? This is the model that can be presented and

used by the costumer. In many cases, it is the costumer who gives the order to be deployed the

model, not the data analyst.

This methodology is used in many data science projects, however because it was

created in 1996, it is becoming more obsolete as the data are more sophisticated. Which is why

new methods like TDPS or DOMINO, which are, in a sense, a “modern” CRISP DM, are being

implemented.
4

Source from KDnuggets.

SEMMA

SEMMA stands for Sample, Explore, Modify, Model and Assembly. Which can be

used as a methodology data scientists use for detecting frauds, costumer loyalty, bankruptcy

forecasting, and so more. It has five stages which breaks down to:

Sample

For the construction of a model, this step must give an appropriate volume and identify

variable that are influencing the process. Once identified, the information is sorted and

categorized.

Explore

In this step, the information that was sorted, is studied in order to check any relationship

between them. Every factor that may influence the data, must be analyzed.

Modify

Once exploration phase is completed, the data is then cleaned for modeling.
5

Model

What modeling techniques should we apply? In this case, it can be applied various

modeling techniques and methodologies to have the best model based on the fine data from the

previous phases. This is usually the best part of this methodology and often the shortest one.

Assembly

Which model best meets the business objectives? At this stage, it’s important to

evaluate whether this model is the best suited for the objective from a business perspective and

reviewed. The decision of whether it’s acceptable or not must be reached at this point.

This methodology of SEMMA, is the same for the last two steps, however the

difference between the two are the selection of the sample process is that is directly related to

the KDD process. This is the second most popular method used for data science.

You might also like