Crisp DM
Crisp DM
References
Pete Chapman (NCR), Julian Clinton (SPSS), Randy Kerber (NCR),
Thomas Khabaza (SPSS), Thomas Reinartz, (DaimlerChrysler), Colin
Shearer (SPSS) and Rdiger Wirth (DaimlerChrysler) CRISP-DM 1.0 Step-by-step data mining guide
P. Gonzalez-Aranda, E.Menasalvas, S.Millan, F. Segovia Towards a
Methodology for Data Mining Project Development: The Importance of
Abstraction
Laura Squier What is Data Mining? PPT
The CRISP-DM Model: The New Blueprint for DataMining, Colin
Shearer, JOURNAL of Data Warehousing, Volume 5, Number 4,
p. 13-22, 2000
References
Websites
http://www.crisp-dm.org/
http://www.crisp-dm.org/CRISPWP-800.pdf
http://www.spss.com/
http://www.kdnuggets.com/
Overview
Introduction to CRISP-DM
Phases and Tasks
Summary
CRISP-DM
CRoss-Industry Standard Process
for Data Mining
Process Standardization
Initiative launched in late 1996 by three veterans of data mining market.
Daimler Chrysler (then Daimler-Benz), SPSS (then ISL) , NCR
Developed and refined through series of workshops (from 1997-1999)
Over 300 organization contributed to the process model
Published CRISP-DM 1.0 (1999)
Over 200 members of the CRISP-DM SIG worldwide
- DM Vendors - SPSS, NCR, IBM, SAS, SGI, Data Distilleries, Syllogic, etc.
- System Suppliers / consultants - Cap Gemini, ICL Retail, Deloitte & Touche, etc.
- End Users - BT, ABB, Lloyds Bank, AirTouch, Experian, etc.
CRISP-DM
Non-proprietary
Application/Industry neutral
Tool neutral
Focus on business issues
As well as technical analysis
CRISP-DM: Overview
10
CRISP-DM: Phases
Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment
11
Data
Understanding
Data
Preparation
Modeling
Evaluation
Deployment
Determine
Business
Objectives
Collect
Initial
Data
Select
Data
Select
Modeling
Technique
Evaluate
Results
Plan
Deployment
Assess
Situation
Describe
Data
Clean
Data
Generate
Test Design
Review
Process
Plan Monitering
&
Maintenance
Determine
Data Mining
Goals
Explore
Data
Construct
Data
Build
Model
Determine
Next Steps
Produce
Final
Report
Produce
Project Plan
Verify
Data
Quality
Integrate
Data
Assess
Model
Review
Project
Format
Data
12
13
Assess situation
- more detailed fact-finding about all of the resources, constraints,
assumptions and other factors that should be considered
- flesh out the details
14
15
16
acquire within the project the data listed in the project resources
includes data loading if necessary for data understanding
possibly leads to initial data preparation steps
if acquiring multiple data sources, integration is an additional issue, either
here or in the later data preparation phase
Describe data
- examine the gross or surface properties of the acquired data
- report on the results
17
18
19
Clean data
- raise the data quality to the level required by the selected analysis
techniques
- may involve selection of clean subsets of the data, the insertion of
suitable defaults or more ambitious techniques such as the estimation of
missing data by modeling
20
Integrate data
- methods whereby information is combined from multiple tables or records
to create new records or values
Format data
- formatting transformations refer to primarily syntactic modifications made
to the data that do not change its meaning, but might be required by the
modeling tool
21
Phase 4. Modeling
Select the modeling technique
(based upon the data mining objective)
Build model
(Parameter settings)
Assess model (rank the models)
Various modeling techniques are selected and applied and their parameters are calibrated to optimal
values. Some techniques have specific requirements on the form of data. Therefore, stepping back to
the data preparation phase is often necessary.
22
Phase 4. Modeling
Select modeling technique
- select the actual modeling technique that is to be used
ex) decision tree, neural network
- if multiple techniques are applied, perform this task for each techniques
separately
23
Phase 4. Modeling
Build model
- run the modeling tool on the prepared dataset to create one or more
models
Assess model
- interprets the models according to his domain knowledge, the data mining
success criteria and the desired test design
- judges the success of the application of modeling and discovery
techniques more technically
- contacts business analysts and domain experts later in order to discuss
the data mining results in the business context
- only consider models whereas the evaluation phase also takes into
account all other results that were produced in the course of the project
24
Phase 5. Evaluation
Evaluation of model
Thoroughly evaluate the model and review the steps executed to construct the model to be certain
it properly achieves the business objectives. A key objective is to determine if there is some
important business issue that has not been sufficiently considered. At the end of this phase, a
decision on the use of the data mining results should be reached
25
Phase 5. Evaluation
Evaluate results
- assesses the degree to which the model meets the business
objectives
- seeks to determine if there is some business reason why this
model is deficient
- test the model(s) on test applications in the real application if
time and budget constraints permit
- also assesses other data mining results generated
- unveil additional challenges, information or hints for future
directions
26
Phase 5. Evaluation
Review process
- do a more thorough review of the data mining engagement in order to
determine if there is any important factor or task that has somehow been
overlooked
- review the quality assurance issues
ex) Did we correctly build the model?
27
Phase 6. Deployment
Determine how the results need to be utilized
Who needs to use them?
The knowledge gained will need to be organized and presented in a way that the
customer can use it. However, depending on the requirements, the deployment
phase can be as simple as generating a report or as complex as implementing a
repeatable data mining process across the enterprise.
28
Phase 6. Deployment
Plan deployment
- in order to deploy the data mining result(s) into the business, takes the
evaluation results and concludes a strategy for deployment
- document the procedure for later deployment
29
Phase 6. Deployment
Produce final report
- the project leader and his team write up a final report
- may be only a summary of the project and its experiences
- may be a final and comprehensive presentation of the data mining
result(s)
Review project
- assess what went right and what went wrong, what was done well and
what needs to be improved
30
Summary
Why CRISP-DM?
The data mining process must be reliable and repeatable by
people with little data mining skills
CRISP-DM provides a uniform framework for
- guidelines
- experience documentation
CRISP-DM is flexible to account for differences
- Different business/agency problems
- Different data
31
32
Thank you
very much!!!
33