0% found this document useful (0 votes)
20 views26 pages

Intorduction To Data Mining

The document provides an overview of data mining, outlining the CRISP-DM process model for data mining projects which includes business understanding, data preparation, modeling, evaluation, and deployment stages. It also discusses applications of data mining, skills needed which include business and database knowledge, and factors for success and failure of data mining projects.

Uploaded by

skguddu2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views26 pages

Intorduction To Data Mining

The document provides an overview of data mining, outlining the CRISP-DM process model for data mining projects which includes business understanding, data preparation, modeling, evaluation, and deployment stages. It also discusses applications of data mining, skills needed which include business and database knowledge, and factors for success and failure of data mining projects.

Uploaded by

skguddu2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Introduction to Data Mining

Table of Contents

 Data-Mining Application
 A Strategy for Data Mining: CRISP-DM
 Stages and tasks in CRISP-DM
 Life Cycle of a Data Mining Project
 Skills Needed for Data Mining
Objectives

 At the end of this module you should be able to,


 List two applications of data mining
 Explain the stages of the CRISP-DM process model

 Describe Successful data-mining projects and the reason


why the project fails
 Describe the skills needed for data mining
Data-Mining Applications (1 of 2)

 Reduce churn (reduce the number of customers who


cancel their policies, subscriptions, or accounts)
 Reduce costs by better targeting customers in direct
mail campaigns
 Reduce costs in a manufacturing process by
preventing machine failures
 Reduce the incidence of a heart attack among those
with a cardiac disease
Data-Mining Applications (2 of 2)

 Better target customers by classifying customers into


groups with distinct usage or need patterns
 Reduce costs by preventing fraudulent credit-card
activity, or detecting fraud in an earlier stage
 Increase revenues by increasing the number of
products sold by cross-selling
 Increase revenues by showing a visitor the best-next-
page on a website
A Strategy for Data Mining: CRISP-DM

 A data-mining project can become complicated


quickly
 A model is needed that guides you through the
critical issues
 Recommendation: use the Cross-Industry Standard
Process for Data Mining (CRISP-DM)
Stages in CRISP-DM

1 Business Understanding
2 Data Understanding
3 Data Preparation
4 Modeling
5 Evaluation
6 Deployment
Stage 1: Business Understanding

Task Sub task 1 Sub task 2 Sub task 3


Determine Background Business Business
business objectives success
objectives criteria
Assess Inventory of Risks and Terminology
situation resources contingencies
Determine Data-mining
data-mining success criteria
objectives
Produce project Write a project Initial
plan plan assessment of
tools and
techniques
Stage 2: Data Understanding

Task Sub task 1

Collect initial data Initial data-collection report

Describe data Data-description report

Explore data Data-exploration report

Verify data quality Data-quality report


Stage 3: Data Preparation

Task Sub task 1 Sub task 2

Select data Rational for inclusion


and exclusion
Clean data Data-cleaning report

Construct data Derived attributes

Format data and Set the unit of analysis Integrate data


combine datasets
Stage 4: Modeling

Task Sub task 1 Sub task 2

Select modeling Modeling


techniques assumptions

Generate test design Test design

Build model Set model Model descriptions


parameters
Assess model Model assessment Revise model
parameters
Stage 5: Evaluation

Task Sub task 1 Sub task 2

Evaluate results Assessment of data- Approve models


mining results with
respect to business
success criteria
Review process Review of process

Determine next steps List of possible actions Decision


Stage 6: Deployment

Task Sub task 1 Sub task 2

Plan deployment Deployment plan

Maintenance Maintenance plan

Produce final report Final report Final presentation

Review project Documentation


The Life Cycle of a Data-Mining Project

 The stages influence each other in a non-linear way


 Data mining is an ongoing endeavor
Data-Mining Success (1 of 4)

 Measures of success:
 the initial assessment will be directly tied to the
predictive accuracy
 in the long run the success of a data-mining effort is
measured by concrete factors
Data-Mining Success (2 of 4)

 Monitoring:
 after deployment, collect data to assess the model’s
success
Data-Mining Success (3 of 4)

 Cost of errors:
 there will always be errors, sometimes with high cost

 if no cost estimates are possible beforehand, then try to


gather this information afterwards, for future use
Data-Mining Success (4 of 4)

 Other measures of project successes:


 seek other measures to determine success from a
business perspective
 bring successes to the attention of colleagues and
management early on in the project, so that tracking
systems or reports can be developed
Data-Mining Failure (1 of 4)

 Bad data:
 no data mining algorithm will be able to compensate for
large amounts of error in the data
 never scrimp on the time spent on data preparation and
cleaning
Data-Mining Failure (2 of 4)

 Organizational resistance:
 difficulties implementing a solution are still part of the
whole data-mining effort
 to address resistance, educate and convince others about
the potential benefits of the solution
 consider implementation in only a portion of the
organization
Data-Mining Failure (3 of 4)

 Results that cannot be deployed:


 factors can be out of the control, or cannot legally be used
in marketing or in making decisions
Data-Mining Failure (4 of 4)

 Cause and effect:


 you must be certain that inputs/predictors in a model
occur before the output
Skills Needed for Data Mining (1 of 4)

 Understanding the business:


 asking the right data-mining question requires
knowledge of the specific business area and organization
 evaluating a data-mining solution needs a business
perspective
Skills Needed for Data Mining (2 of 4)

 Database knowledge:
 the database administrator plays an important role:
 Which data tables or files are available?
 How are they linked?
 How are the fields coded?
 What are reasonable data values?
Skills Needed for Data Mining (3 of 4)

 Knowledge of data-mining techniques:


 best tools for situation

 fine-tuning techniques

 assess effects of data on outcome

 identify anomalies
Skills Needed for Data Mining (4 of 4)

 Team work combining multiple competencies,


such as:
 business domain knowledge
 database knowledge

 data-mining algorithms

 project management

You might also like