CRISP-DM Template Final Project
CRISP-DM Template Final Project
CRISP-DM Template Final Project
Carlo Lipizzi
[email protected]
SSE
Contents
CRISP
Business Understanding
Data Understanding
Data Preparation
Data Representation
Practical Results – Conclusions
Attachments
2
Project Goals and Conditions
• What are the project goals? What is the key question you are
required to answer?
• Are there any conditions limiting or somehow defining the project,
like limited access to data, data too old, time constrains
• A brief description of the expected results may be added
3
Contents
CRISP
Business Understanding
Data Understanding
Data Preparation
Data Representation
Practical Results – Conclusions
Attachments
4
CRISP-DE
Background Info & Definition
5
Business Understanding
• Definition*
– Define business requirements and objectives
– Translate objectives into data mining problem definition
– Prepare initial strategy to meet objectives
• You want to be sure to clearly describe the business
needs and the steps to address them
6
Data Understanding
• Definition*
– Collect data
– Assess data quality
– Perform exploratory data analysis (EDA)
• Overall data description: sources, organization, key
characteristics (sensor/human generated, reliable/unreliable
source, …)
• Here you run all the descriptive statistical tests that make sense for
the specific case, describing the different steps and their specific
meanings
7
Data Preparation
• Definition*
– Cleanse, prepare, and transform data set
– Prepares for modeling in subsequent phases
– Select cases and variables appropriate for analysis
• First define the steps you are going to perform (e.g.: if you
normalize, why)
• Here you perform all the data transformation applicable to the
case: missing/miscalculated/misplaced values, outliers,
normalization
• Describe the final dataset (format, new records number, new
variables, …)
8
Data Representation
• Definition
– Select and apply one or more descriptive statistics to the
dataset
– Select and apply one or more visualization to the dataset
– It may be an iterative process: adjustments may be required
– If necessary, additional data preparation may be required
• Explain why you selected a representation to an other
• Describe final results
• Read the results with business sense and provide your comments
9
Contents
CRISP
Business Understanding
Data Understanding
Data Preparation
Data Representation
Practical Results – Conclusions
Attachments
10
Conclusions
11
Contents
CRISP
Business Understanding
Data Understanding
Data Preparation
Data Representation
Practical Results – Conclusions
Attachments
12
Attachments
13