0% found this document useful (0 votes)

5 views5 pages

Best Methodologies

Uploaded by

GUSTAVO ALEJANDRO MENDIZABAL HERRERA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views5 pages

Best Methodologies

Uploaded by

GUSTAVO ALEJANDRO MENDIZABAL HERRERA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Best Methodologies/frameworks for Data Science 1

Best methodologies/frameworks for Data Science.

Gustavo Mendizabal

Universidad Católica del Norte

Ingenieria Civil Industrial

Nota del autor

Esto está escrito en inglés deliberadamente.

Best methologies/framework for Data Science

While looking for websites that has some information about the best methodologies for

Data Science. I came across a certain website called Data Science Process Alliance, which is

“A Community of Data & AI Practitioners”, that lists some of the Methodologies such as,

Waterfall, KDD, SEMMA, CRISP-DM, TDSP, DOMINO, and their descriptions, explained in

such a visual way that makes it so easy to understand. After thoroughly reading each one of

them. I chose two which are the best methodologies I would use. Keep in mind that they all

have their Strengths and Challenges, as well what they’re best for.

CRISP DM

Cross Industry Standard Process for Data Mining is a methodology created by five

different companies, Integral solutions Ltd, Teradata, Daimler AG, NCR Corporation and

OHRA in 1996.

It has six sequential phases, each of them answers a different scenario.

Business Understanding

What does the business need? This phase focuses on understanding the objectives and

requirements from a business perspective, then turn this knowledge into a data mining

problem.

Data Understanding

What data do we have and/or need? Is it clean? This phase focuses on understanding

the data, becoming familiar with it, in which create a hypothesis from it.

Data Preparation

How do we organize the data for modeling? Once the previous phase is over, it’s time

to begin the construction of a fine data. This preparation task most likely will be performed

multiple times
3

Modeling

What modeling techniques should we apply? In this case, it can be applied various

modeling techniques and methodologies to have the best model based on the fine data from the

previous phases. This is usually the best part of this methodology and often the shortest one.

Evaluation

Which model best meets the business objectives? At this stage, it’s important to

evaluate whether this model is the best suited for the objective from a business perspective and

reviewed. The decision of whether it’s acceptable or not must be reached at this point.

Deployment

How do stakeholders access the results? This is the model that can be presented and

used by the costumer. In many cases, it is the costumer who gives the order to be deployed the

model, not the data analyst.

This methodology is used in many data science projects, however because it was

created in 1996, it is becoming more obsolete as the data are more sophisticated. Which is why

new methods like TDPS or DOMINO, which are, in a sense, a “modern” CRISP DM, are being

implemented.
4

Source from KDnuggets.

SEMMA

SEMMA stands for Sample, Explore, Modify, Model and Assembly. Which can be

used as a methodology data scientists use for detecting frauds, costumer loyalty, bankruptcy

forecasting, and so more. It has five stages which breaks down to:

Sample

For the construction of a model, this step must give an appropriate volume and identify

variable that are influencing the process. Once identified, the information is sorted and

categorized.

Explore

In this step, the information that was sorted, is studied in order to check any relationship

between them. Every factor that may influence the data, must be analyzed.

Modify

Once exploration phase is completed, the data is then cleaned for modeling.
5

Model

What modeling techniques should we apply? In this case, it can be applied various

modeling techniques and methodologies to have the best model based on the fine data from the

previous phases. This is usually the best part of this methodology and often the shortest one.

Assembly

Which model best meets the business objectives? At this stage, it’s important to

evaluate whether this model is the best suited for the objective from a business perspective and

reviewed. The decision of whether it’s acceptable or not must be reached at this point.

This methodology of SEMMA, is the same for the last two steps, however the

difference between the two are the selection of the sample process is that is directly related to

the KDD process. This is the second most popular method used for data science.

PPT4 W3 S4 R0 Predictive Analytics I Data Mining Process
No ratings yet
PPT4 W3 S4 R0 Predictive Analytics I Data Mining Process
50 pages
What Is CRISP DM - Data Science Process Alliance
No ratings yet
What Is CRISP DM - Data Science Process Alliance
20 pages
02 Data Preprocessing
No ratings yet
02 Data Preprocessing
62 pages
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
Ghar Ki Baat Ghar Me Hi Rehne Do - Part 1 - Desi Kahani
50% (2)
Ghar Ki Baat Ghar Me Hi Rehne Do - Part 1 - Desi Kahani
6 pages
Bda Bi Jit Chapter-3
No ratings yet
Bda Bi Jit Chapter-3
40 pages
Section 1
No ratings yet
Section 1
49 pages
Muhammad Zeeshan
No ratings yet
Muhammad Zeeshan
32 pages
Data Science Methodologies
No ratings yet
Data Science Methodologies
31 pages
02 Crispdm
No ratings yet
02 Crispdm
25 pages
HW5e Int Tests Guide
50% (2)
HW5e Int Tests Guide
1 page
Data Mining - Bi 3
No ratings yet
Data Mining - Bi 3
40 pages
Data2 Science Process Am
No ratings yet
Data2 Science Process Am
33 pages
Introduction To Data Analytics: Roberta Turra
No ratings yet
Introduction To Data Analytics: Roberta Turra
23 pages
Chapter 3-IB
No ratings yet
Chapter 3-IB
69 pages
Sharda 11e Full Accessible PPT 04
No ratings yet
Sharda 11e Full Accessible PPT 04
40 pages
Data Mining Unit-1 Complete
No ratings yet
Data Mining Unit-1 Complete
45 pages
Framework For Building ML Systems: Crisp-Dm
No ratings yet
Framework For Building ML Systems: Crisp-Dm
28 pages
IBA - MODULe 4.3
No ratings yet
IBA - MODULe 4.3
10 pages
Screenshot 2024-06-04 at 12.01.00 AM
No ratings yet
Screenshot 2024-06-04 at 12.01.00 AM
45 pages
Screenshot 2024-06-04 at 12.07.18 AM
No ratings yet
Screenshot 2024-06-04 at 12.07.18 AM
45 pages
Screenshot 2024-06-04 at 12.00.45 AM
No ratings yet
Screenshot 2024-06-04 at 12.00.45 AM
45 pages
Screenshot 2024-06-03 at 11.59.21 PM
No ratings yet
Screenshot 2024-06-03 at 11.59.21 PM
45 pages
IMP Questions & Ans On ML & CI Using Python
No ratings yet
IMP Questions & Ans On ML & CI Using Python
21 pages
Data Mining
No ratings yet
Data Mining
41 pages
Data Mining - Intro
No ratings yet
Data Mining - Intro
17 pages
Exam 1
No ratings yet
Exam 1
12 pages
Big Data Analytics Quick Guide
100% (1)
Big Data Analytics Quick Guide
53 pages
07 DataMining
No ratings yet
07 DataMining
37 pages
Data Mining
No ratings yet
Data Mining
6 pages
DSS Lec.8
No ratings yet
DSS Lec.8
22 pages
Presentation1 Revised (Autosaved)
No ratings yet
Presentation1 Revised (Autosaved)
83 pages
USAMO
No ratings yet
USAMO
7 pages
Data Mining
100% (2)
Data Mining
36 pages
BI Chapter 04 - Unlocked
No ratings yet
BI Chapter 04 - Unlocked
47 pages
Unit III DWDM
No ratings yet
Unit III DWDM
113 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
Astesj 020376
No ratings yet
Astesj 020376
7 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
48 pages
Predictive Analytics Modelling (21CSH-440) : Apex Institute of Technology
No ratings yet
Predictive Analytics Modelling (21CSH-440) : Apex Institute of Technology
42 pages
Capstone Project - Unit2
No ratings yet
Capstone Project - Unit2
81 pages
ModelQB - Part B&C-1
No ratings yet
ModelQB - Part B&C-1
51 pages
SEMMA
No ratings yet
SEMMA
2 pages
Big Data Basics
No ratings yet
Big Data Basics
7 pages
Unit 3 Data Warehousing and Data Mining
No ratings yet
Unit 3 Data Warehousing and Data Mining
7 pages
DSS Chapter 5
No ratings yet
DSS Chapter 5
9 pages
My Chapter Two
No ratings yet
My Chapter Two
57 pages
Excel Cheat Sheet
No ratings yet
Excel Cheat Sheet
61 pages
C Series Product Guide PDF
No ratings yet
C Series Product Guide PDF
112 pages
Data Analytics Part 3
No ratings yet
Data Analytics Part 3
54 pages
Asc 399 Exercise
No ratings yet
Asc 399 Exercise
1 page
Data Science Methodology
No ratings yet
Data Science Methodology
3 pages
R8 Waray BoSY CRLA 11.24.2021 v4
No ratings yet
R8 Waray BoSY CRLA 11.24.2021 v4
10 pages
KDD, Semma and Crisp-Dm: A Parallel Overview Ana Azevedo and M.F. Santos
No ratings yet
KDD, Semma and Crisp-Dm: A Parallel Overview Ana Azevedo and M.F. Santos
6 pages
What Is Data Mining?: Dama-Ncr
No ratings yet
What Is Data Mining?: Dama-Ncr
36 pages
Big Data Analytics - Quick Guide - Tutorialspoint
No ratings yet
Big Data Analytics - Quick Guide - Tutorialspoint
50 pages
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
No ratings yet
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
60 pages
What Is Data Mining?: Dama-Ncr
No ratings yet
What Is Data Mining?: Dama-Ncr
36 pages
Notes On Data Science Methodologies
No ratings yet
Notes On Data Science Methodologies
4 pages
Data Science Lifecycle
No ratings yet
Data Science Lifecycle
3 pages
What Is Data Mining?: Dama-Ncr
No ratings yet
What Is Data Mining?: Dama-Ncr
36 pages
Intro 2
No ratings yet
Intro 2
3 pages
Business Uses of Data Mining and Data Warehousing MIS 304 Section 04 CRN-41595
No ratings yet
Business Uses of Data Mining and Data Warehousing MIS 304 Section 04 CRN-41595
23 pages
2024 NGN HESI EXIT RN Exam V1, V2, V3, V4, V5, V6, Each Exam With 160 Latest Questions and Answers Updated (Verified Revised Full Exam)
No ratings yet
2024 NGN HESI EXIT RN Exam V1, V2, V3, V4, V5, V6, Each Exam With 160 Latest Questions and Answers Updated (Verified Revised Full Exam)
462 pages
MAPEH 7 Badminton
No ratings yet
MAPEH 7 Badminton
3 pages
PART II - Private Corporations
No ratings yet
PART II - Private Corporations
6 pages
Introduction To Six Sigma &amp Process Improvement 2nd An James R. Evans &amp William M. Lindsay Instant Download
100% (2)
Introduction To Six Sigma &amp Process Improvement 2nd An James R. Evans &amp William M. Lindsay Instant Download
28 pages
0-02-Oct-2017-05-10-50English Self Learning Material PDF
No ratings yet
0-02-Oct-2017-05-10-50English Self Learning Material PDF
258 pages
Library Manager
No ratings yet
Library Manager
20 pages
Duraco Septic Tank
100% (1)
Duraco Septic Tank
6 pages
Ex - Mayor Sanchez
No ratings yet
Ex - Mayor Sanchez
3 pages
Contraception Today A Pocketbook For General Practitioners and Practice Nurses 7th Edition John Guillebaud
No ratings yet
Contraception Today A Pocketbook For General Practitioners and Practice Nurses 7th Edition John Guillebaud
55 pages
Johnson and Lester 2021 - Mental Health in Academia - Hacks For Cultivating and Sustaining Wellbeing
100% (1)
Johnson and Lester 2021 - Mental Health in Academia - Hacks For Cultivating and Sustaining Wellbeing
13 pages
Oracle Forms Developer Tutorials
No ratings yet
Oracle Forms Developer Tutorials
2 pages
9 TLE - Poultry Production - Module 5 - Perform Preventive N Therapeutic Measures
No ratings yet
9 TLE - Poultry Production - Module 5 - Perform Preventive N Therapeutic Measures
26 pages
Creative Writing
No ratings yet
Creative Writing
68 pages
Operator'S Manual: 110 Series Leveling System HWH Lever-Controlled
100% (1)
Operator'S Manual: 110 Series Leveling System HWH Lever-Controlled
15 pages
DKA NICE Guidelines
No ratings yet
DKA NICE Guidelines
6 pages
Global Marketing
No ratings yet
Global Marketing
9 pages
Compilation - Stamp Duty - Lease Deed
No ratings yet
Compilation - Stamp Duty - Lease Deed
7 pages
Invisisil Op2131sd Uv Cure Optical Bonding Silicone Tds
No ratings yet
Invisisil Op2131sd Uv Cure Optical Bonding Silicone Tds
5 pages
PDF p2 Guerrero Ch15 Compress
No ratings yet
PDF p2 Guerrero Ch15 Compress
27 pages
Socrates Term Paper
No ratings yet
Socrates Term Paper
6 pages
Warda Resume
No ratings yet
Warda Resume
4 pages
Cardio (PP012) Quiz 1 Grades
No ratings yet
Cardio (PP012) Quiz 1 Grades
7 pages
Customer Persona
No ratings yet
Customer Persona
2 pages
24/07/08 TP-Link W8920G 108M ADSL and ADSL2+ Set Up Guide
No ratings yet
24/07/08 TP-Link W8920G 108M ADSL and ADSL2+ Set Up Guide
7 pages
Biju Expence Details
No ratings yet
Biju Expence Details
2 pages
Real-World Decision Modeling with DMN: Effective Communication of Decision-Making
From Everand
Real-World Decision Modeling with DMN: Effective Communication of Decision-Making
James Taylor
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet

Best Methodologies

Uploaded by

Best Methodologies

Uploaded by

Best Methodologies/frameworks for Data Science 1

Best methodologies/frameworks for Data Science.

Universidad Católica del Norte

Ingenieria Civil Industrial

Nota del autor

Esto está escrito en inglés deliberadamente.

Best methologies/framework for Data Science

It has six sequential phases, each of them answers a different scenario.

model, not the data analyst.

Source from KDnuggets.

You might also like